Why Not Flat Files #

The Problem: Data Without Structure #

Every application needs to store data. User accounts, orders, messages, sensor readings — it all has to live somewhere. The simplest approach is to write it to a file. A CSV, a JSON file, a plain text log. And for small, single-user applications, that actually works.

But the moment your data grows — or more than one process needs to read and write it — flat files start to break in ways that are surprisingly hard to fix.

Consider a simple scenario: you’re building an application that tracks customer orders. You store everything in a CSV file:

customer_name	email	product	quantity	price	order_date
Alice	alice@example.com	Widget	3	9.99	2025-03-15
Bob	bob@example.com	Gadget	1	24.99	2025-03-16
Alice	alice@example.com	Gizmo	2	14.99	2025-03-17

This looks manageable. But even with three rows, problems are already hiding in plain sight.

What Goes Wrong with Flat Files #

Data Duplication #

Alice’s name and email appear in every row where she has an order. If Alice changes her email address, you need to find and update every row that mentions her. Miss one, and you have inconsistent data — two different email addresses for the same person, with no way to know which is correct.

This is the update anomaly problem. In a flat file, there’s no separation between “who the customer is” and “what they ordered.” The two concepts are tangled together in every row.

Deletion Destroys Unrelated Data #

Suppose Bob’s only order gets cancelled, and you delete that row. You’ve now lost not just the order, but all knowledge that Bob exists as a customer — his name and email are gone. This is the deletion anomaly: removing one fact (the order) accidentally destroys an unrelated fact (the customer).

Insertion Requires Fabrication #

What if you want to add a new customer who hasn’t placed an order yet? In this flat file structure, every row requires an order. You’d have to either leave the order fields blank (violating your own format) or make up a dummy order. This is the insertion anomaly: you can’t record one fact without fabricating another.

Concurrent Access #

Now imagine two processes try to update the file at the same time. One is adding a new order while another is updating Alice’s email. Without careful coordination, one write can overwrite the other, or the file can end up in a corrupted, half-written state. Flat files have no built-in mechanism for handling concurrent access.

Querying Requires Custom Code #

Want to find all orders over $20? You have to read the entire file, parse every line, and filter manually. Want to find the total revenue per customer? You need to write that aggregation logic yourself. As data grows, these operations become slow and error-prone.

To make this concrete, consider a student database stored as a CSV:

student_id	first_name	last_name	email	gender	gpa	major	enrollment_date
S001	Alice	Johnson	alice.johnson@example.edu	Female	3.85	Computer Science	2023-09-01
S002	Bob	Williams	bob.williams@example.edu	Male	3.20	Mechanical Engineering	2023-09-01
S003	Catherine	Davis	catherine.davis@example.edu	Female	3.95	Biology	2023-09-01
S004	David	Brown	david.brown@example.edu	Male	3.50	Business Administration	2023-09-01
S005	Eve	Miller	eve.miller@example.edu	Female	3.75	Psychology	2023-09-01
S006	Frank	Wilson	frank.wilson@example.edu	Male	3.10	History	2023-09-01

Now suppose you want to answer a simple question: “List the students with a Psychology major.” Here is the pseudocode you’d need to write:

SET filename TO "students.csv"
OPEN file WITH filename

// Read and discard the header line
READ line FROM file

WHILE there are more lines in file:
    READ line FROM file
    SET fields TO SPLIT line BY ","
    SET major TO fields[6]
    SET first_name TO fields[1]
    SET last_name TO fields[2]

    IF major IS "Psychology":
        DISPLAY first_name + " " + last_name
    END IF
END WHILE

CLOSE file

That’s a lot of code for one simple question. And every new question — “list students enrolled before a certain date,” “update David Brown’s major,” “add a new student” — requires writing another custom routine. Every client application that needs this data must duplicate the same parsing and filtering logic.

The Full List of Problems #

Taken together, flat file storage suffers from:

Custom code required for every query and every update
Every client application must duplicate this logic
No data type enforcement — nothing prevents a string where a number should be
Data duplication and no guardrails against inconsistency
Limited support for concurrent access
Difficult to implement fine-grained security and access control
Scalability problems — no support for horizontal scaling or distribution
No built-in backup and recovery mechanisms

These aren’t edge cases. They’re fundamental limitations that any non-trivial application will hit. The solution is a database management system — a purpose-built layer that handles all of these problems so your application doesn’t have to.