A Brief History of Database Systems #

The history of databases is the story of figuring out how to organize data so that it’s both efficient to store and meaningful to query.

Navigational Databases (1960s) #

The earliest database systems — the hierarchical model (IBM’s IMS, 1966) and the network model (CODASYL, late 1960s) — stored data in tree or graph structures. To find a piece of data, your application had to navigate through these structures, following pointers from one record to the next.

These systems were fast for the specific access patterns they were designed for, but rigid. If you needed to query your data in a way the original designers hadn’t anticipated, you often had to restructure the entire database or write complex navigation code.

The Relational Revolution (Edgar F. Codd, 1970) #

Before Codd, programs were “hard-wired” to the physical structure of the database — disk locations, pointers, file layouts. Any change to the database structure (adding a field, reorganizing storage) required rewriting the application code. Asking new, unplanned questions about the data was extremely difficult.

In 1970, Edgar F. Codd, a researcher at IBM, published a paper titled “A Relational Model of Data for Large Shared Data Banks.” His groundbreaking idea was abstraction: treat data with mathematical rigor, irrespective of how it’s physically stored. Codd’s model rested on three pillars:

Structure: Tables (Relations). Data is organized into tables. Each table has a fixed set of columns (attributes), and each row (tuple) represents one record. This replaced the tangled pointer-based structures of navigational databases with a clean, uniform representation.
Manipulation: Logical Queries. You describe the result you want, not the steps to get it. This declarative approach led to SQL and freed applications from needing to know the internal layout of the data.
Independence: Data from Code. The physical storage can change — files can be reorganized, indexes can be added or removed, data can be partitioned across disks — without affecting the application’s view of the data or requiring any changes to queries.

This separation of logical and physical layers was the fundamental shift. In navigational databases, the application was the query engine. In the relational model, the DBMS handles the mechanics, and the application just states what it needs.

SQL and the Commercial Success of RDBMS (1980s–1990s) #

Codd’s ideas were implemented in research prototypes like System R (IBM) and Ingres (UC Berkeley), which eventually evolved into commercial products. SQL (Structured Query Language) was standardized, and relational databases became a major industry.

The RDBMS ecosystem that emerged — and largely persists today — includes: IBM DB2, Oracle Database, Sybase, Informix, Microsoft SQL Server, PostgreSQL, MySQL, MariaDB, SQLite, and Teradata, among others. These systems have survived the test of time; most are still actively developed and widely deployed decades after their creation.

SQL allowed anyone to express complex queries in a relatively readable language:

SELECT customer_name, SUM(price * quantity) AS total_spent
FROM orders
GROUP BY customer_name
HAVING total_spent > 100;

By the 1990s, relational databases had become the default choice for nearly all data storage needs. Object-oriented databases emerged during this period as well, but never displaced relational systems in mainstream use. RDBMS continued its dominance.

The Internet Era and NoSQL (2000s–2010s) #

As the internet became mainstream, some applications hit limits of traditional relational databases — particularly around horizontal scaling (distributing data across many machines) and handling unstructured or semi-structured data. Key publications — Google’s MapReduce and Bigtable papers, Amazon’s Dynamo paper — laid the groundwork for a new class of systems.

This led to the emergence of NoSQL databases: key-value stores (Redis, DynamoDB), document stores (MongoDB, CouchDB), column-family stores (Cassandra, HBase), and graph databases (Neo4j). These systems trade some of the guarantees of relational databases for flexibility, scalability, or performance in specific use cases. The era of “Big Data” had arrived.

The “NoSQL” label is somewhat misleading — it doesn’t mean “never use SQL.” Most modern applications use relational databases for their core data and reach for specialized systems when they have a specific need that relational databases handle poorly. RDBMS continued its popularity throughout this period.

Specialized and AI-Native Databases (2020s) #

Most recently, the rise of large language models and AI applications has fueled the widespread adoption and development of vector databases — systems optimized for storing and searching high-dimensional embeddings (Pinecone, Weaviate, Milvus, pgvector). Traditional RDBMS vendors have also added vector search capabilities, continuing the pattern of relational databases absorbing new paradigms rather than being replaced by them.

Where This Course is Headed #

The remaining modules in this course will build on this foundation:

The Relational Model — tables, keys, relationships, relational algebra, and why this model has endured for over 50 years.
SQL Fundamentals — the language for defining, querying, and manipulating relational data.
Schema Design and Normalization — how to structure your tables to avoid the anomalies we discussed earlier.
Indexes and Query Performance — how databases find data quickly, and how your schema decisions affect performance.
Beyond Relational — when and why you might choose a non-relational database, and how to think about the trade-offs.

Each module builds on the last. By the end, you won’t just know how to use a database — you’ll understand why databases work the way they do.