Why Not Flat Files #
The Problem: Data Without Structure #
Every application needs to store data. User accounts, orders, messages, sensor readings — it all has to live somewhere. The simplest approach is to write it to a file. A CSV, a JSON file, a plain text log. And for small, single-user applications, that actually works.
But the moment your data grows — or more than one process needs to read and write it — flat files start to break in ways that are surprisingly hard to fix.
Consider a simple scenario: you’re building an application that tracks customer orders. You store everything in a CSV file:
| customer_name | product | quantity | price | order_date | |
|---|---|---|---|---|---|
| Alice | alice@example.com | Widget | 3 | 9.99 | 2025-03-15 |
| Bob | bob@example.com | Gadget | 1 | 24.99 | 2025-03-16 |
| Alice | alice@example.com | Gizmo | 2 | 14.99 | 2025-03-17 |
This looks manageable. But even with three rows, problems are already hiding in plain sight.
What Goes Wrong with Flat Files #
Data Duplication #
Alice’s name and email appear in every row where she has an order. If Alice changes her email address, you need to find and update every row that mentions her. Miss one, and you have inconsistent data — two different email addresses for the same person, with no way to know which is correct.
This is the update anomaly problem. In a flat file, there’s no separation between “who the customer is” and “what they ordered.” The two concepts are tangled together in every row.
Deletion Destroys Unrelated Data #
Suppose Bob’s only order gets cancelled, and you delete that row. You’ve now lost not just the order, but all knowledge that Bob exists as a customer — his name and email are gone. This is the deletion anomaly: removing one fact (the order) accidentally destroys an unrelated fact (the customer).
Insertion Requires Fabrication #
What if you want to add a new customer who hasn’t placed an order yet? In this flat file structure, every row requires an order. You’d have to either leave the order fields blank (violating your own format) or make up a dummy order. This is the insertion anomaly: you can’t record one fact without fabricating another.
Concurrent Access #
Now imagine two processes try to update the file at the same time. One is adding a new order while another is updating Alice’s email. Without careful coordination, one write can overwrite the other, or the file can end up in a corrupted, half-written state. Flat files have no built-in mechanism for handling concurrent access.
Querying Requires Custom Code #
Want to find all orders over $20? You have to read the entire file, parse every line, and filter manually. Want to find the total revenue per customer? You need to write that aggregation logic yourself. As data grows, these operations become slow and error-prone.
To make this concrete, consider a student database stored as a CSV:
| student_id | first_name | last_name | gender | gpa | major | enrollment_date | |
|---|---|---|---|---|---|---|---|
| S001 | Alice | Johnson | alice.johnson@example.edu | Female | 3.85 | Computer Science | 2023-09-01 |
| S002 | Bob | Williams | bob.williams@example.edu | Male | 3.20 | Mechanical Engineering | 2023-09-01 |
| S003 | Catherine | Davis | catherine.davis@example.edu | Female | 3.95 | Biology | 2023-09-01 |
| S004 | David | Brown | david.brown@example.edu | Male | 3.50 | Business Administration | 2023-09-01 |
| S005 | Eve | Miller | eve.miller@example.edu | Female | 3.75 | Psychology | 2023-09-01 |
| S006 | Frank | Wilson | frank.wilson@example.edu | Male | 3.10 | History | 2023-09-01 |
Now suppose you want to answer a simple question: “List the students with a Psychology major.” Here is the pseudocode you’d need to write:
SET filename TO "students.csv"
OPEN file WITH filename
// Read and discard the header line
READ line FROM file
WHILE there are more lines in file:
READ line FROM file
SET fields TO SPLIT line BY ","
SET major TO fields[6]
SET first_name TO fields[1]
SET last_name TO fields[2]
IF major IS "Psychology":
DISPLAY first_name + " " + last_name
END IF
END WHILE
CLOSE file
That’s a lot of code for one simple question. And every new question — “list students enrolled before a certain date,” “update David Brown’s major,” “add a new student” — requires writing another custom routine. Every client application that needs this data must duplicate the same parsing and filtering logic.
The Full List of Problems #
Taken together, flat file storage suffers from:
- Custom code required for every query and every update
- Every client application must duplicate this logic
- No data type enforcement — nothing prevents a string where a number should be
- Data duplication and no guardrails against inconsistency
- Limited support for concurrent access
- Difficult to implement fine-grained security and access control
- Scalability problems — no support for horizontal scaling or distribution
- No built-in backup and recovery mechanisms
These aren’t edge cases. They’re fundamental limitations that any non-trivial application will hit. The solution is a database management system — a purpose-built layer that handles all of these problems so your application doesn’t have to.