The Relational Model #
Thinking in Tables #
The relational model is built on one deceptively simple idea: all data can be represented as tables.
A table (formally called a relation) is a named collection of rows that all share the same columns. Each column has a name and a data type. Each row represents one record.
Consider a table called customers:
| customer_id | name | |
|---|---|---|
| 1 | Alice | alice@example.com |
| 2 | Bob | bob@example.com |
| 3 | Charlie | charlie@example.com |
And a table called orders:
| order_id | customer_id | product | quantity | price | order_date |
|---|---|---|---|---|---|
| 101 | 1 | Widget | 3 | 9.99 | 2025-03-15 |
| 102 | 2 | Gadget | 1 | 24.99 | 2025-03-16 |
| 103 | 1 | Gizmo | 2 | 14.99 | 2025-03-17 |
Notice what changed compared to the flat file from the previous module. Alice’s name and email appear exactly once, in the customers table. Her orders reference her by customer_id. If Alice changes her email, you update one row in one table.
This separation is the relational model’s core contribution: each fact is stored in exactly one place.
Key Terminology #
Before going further, let’s establish the precise vocabulary. The relational model has formal mathematical terms, and databases use more casual equivalents. Both refer to the same concepts:
| Formal term | Casual term | Meaning |
|---|---|---|
| Relation | Table | A named set of tuples sharing the same attributes |
| Tuple | Row / Record | A single entry in a relation |
| Attribute | Column / Field | A named property with a defined type |
| Domain | Data type | The set of allowed values for an attribute |
| Relation schema | Table definition | The name, attributes, and types of a relation |
We’ll use the casual terms throughout this course, since that’s what you’ll encounter in practice.
Properties of Attributes #
Attributes in the relational model have several important properties:
- Each attribute has a unique name within its table.
- Attributes are atomic (indivisible) — each cell holds a single value, not a list or a nested structure. This is a foundational rule of the relational model.
- Each attribute has a domain: the set of allowed values. For example, a
gpaattribute might have a domain of real numbers between 0.0 and 4.0. - Attributes can hold a special value called null, representing missing or unknown data. Null is not the same as zero or an empty string — it means “no value.”
Relations Are Unordered #
A property that surprises many newcomers: the order of rows and columns in a relation is irrelevant.
Consider these two representations of the same student data:
R1:
| student_id | first_name | last_name | major | total_credits |
|---|---|---|---|---|
| 101 | Alice | Smith | CS | 60 |
| 102 | Bob | Johnson | Physics | 45 |
| 103 | Carol | Williams | Math | 72 |
R2:
| major | total_credits | last_name | student_id | first_name |
|---|---|---|---|---|
| Physics | 45 | Johnson | 102 | Bob |
| CS | 60 | Smith | 101 | Alice |
| Math | 72 | Williams | 103 | Carol |
R1 and R2 are the same relation. The rows are in a different order, the columns are in a different order, but the data they represent is identical. A relation is a set of tuples — and sets have no inherent order.
This is a mathematical property, not just a convention. It means that queries should never depend on the physical ordering of rows or columns. If you need a specific order, you explicitly request it (with ORDER BY in SQL).
Keys #
Keys are how the relational model enforces identity and relationships. There are several types, each serving a different purpose.
Super Keys #
A super key is any set of columns that can uniquely identify a row in a table. In the students table above, {student_id} is a super key, but so is {student_id, first_name} or even {student_id, first_name, last_name, major, total_credits} — adding more columns to a unique set doesn’t break uniqueness.
Super keys are the broadest category. Every table has at least one (the set of all columns), but most super keys are unnecessarily large.
Candidate Keys #
A candidate key is a minimal super key — a super key from which you cannot remove any column and still have it uniquely identify rows. In the students table, {student_id} is a candidate key. If email were also in the table and unique, {email} would be another candidate key.
A table can have multiple candidate keys. The one you designate as the official identifier becomes the primary key; the others are called alternate keys.
Every table needs a way to uniquely identify each row. The primary key is the candidate key you choose as the official identifier. Its value is unique across all rows in the table and can never be null.
In the customers table above, customer_id is the primary key. No two customers share the same ID, and every customer has one.
Natural Keys vs. Surrogate Keys #
A natural key is a value that already exists in the data and is inherently unique. For example, a Social Security Number, an ISBN, or an email address.
A surrogate key is an artificial identifier — typically an auto-incrementing integer or a UUID — assigned by the database purely for the purpose of identification.
In practice, surrogate keys are more common, for several reasons:
- Natural keys can change. People change their email addresses. Companies change their tax IDs. When a primary key changes, every reference to it must also change.
- Natural keys can be composite. Sometimes no single natural attribute is unique, so you need a combination (first name + last name + date of birth), which makes references unwieldy.
- Surrogate keys are compact and fast to index. An integer comparison is faster than a string comparison.
That said, natural keys have their place. If the data already has a stable, unique identifier (like a country code or a currency code), using it as the primary key can make queries more readable.
Foreign Keys #
A foreign key is a column in one table that references the primary key of another table. It’s the mechanism that creates relationships between tables.
In the orders table, customer_id is a foreign key that references customers.customer_id. This tells the database: “every order must belong to a customer that exists in the customers table.”
The database enforces this. If you try to insert an order with customer_id = 99 and no customer with that ID exists, the database will reject the operation. This is called referential integrity — the guarantee that references between tables are always valid.
Foreign keys also control what happens when a referenced row is deleted or updated:
- CASCADE: If a customer is deleted, automatically delete all their orders too.
- RESTRICT: Refuse to delete a customer who has orders.
- SET NULL: If a customer is deleted, set
customer_idto NULL in their orders.
The right choice depends on your domain. For orders, RESTRICT is usually correct — you don’t want to accidentally erase order history. For draft documents, CASCADE might make sense.
Types of Relationships #
Tables can be related to each other in three fundamental ways:
One-to-Many #
The most common relationship. One customer has many orders. One department has many employees. One author has many books.
This is implemented with a foreign key on the “many” side. The orders table has a customer_id column pointing back to customers.
One-to-One #
One row in table A corresponds to exactly one row in table B. For example, a users table and a user_profiles table, where each user has exactly one profile.
This is implemented with a foreign key that is also unique. It’s less common, and usually indicates that the two tables could be merged — unless there’s a reason to separate them (different access patterns, security, or optional data).
Many-to-Many #
One student enrolls in many courses. One course has many students. Neither side “owns” the relationship.
This cannot be represented with a single foreign key. Instead, you create a junction table (also called a join table or associative table):
| student_id | course_id |
|---|---|
| 1 | 201 |
| 1 | 202 |
| 2 | 201 |
| 3 | 203 |
The junction table student_courses has two foreign keys — one to students and one to courses. Together, they form the table’s primary key (a composite primary key), ensuring that the same student can’t be enrolled in the same course twice.
Constraints: Rules the Database Enforces #
Beyond primary and foreign keys, relational databases support several types of constraints that enforce data quality at the storage level:
- NOT NULL: The column must always have a value. A customer must have a name.
- UNIQUE: No two rows can have the same value in this column. Email addresses, usernames, and similar identifiers are often marked unique.
- CHECK: A condition that every row must satisfy. For example,
CHECK (price > 0)ensures no product has a negative or zero price. - DEFAULT: A value to use when one isn’t provided. For example,
order_date DEFAULT CURRENT_DATE.
These constraints are part of the table definition (the schema), not the application code. This is important: even if your application has a bug, the database will still refuse invalid data.
Relational Algebra: The Mathematical Foundation #
The relational model isn’t just an intuitive way to organize data — it has a formal mathematical foundation called relational algebra. Understanding relational algebra isn’t required for day-to-day SQL work, but it explains why SQL works the way it does.
Relational algebra is a procedural language consisting of a set of operations. Each operation takes one or more relations as input and produces a new relation as output:
f(R1, R2, … Rn) = Ro
This closure property — the fact that every operation returns a relation — is what makes relational operations composable. You can chain them together, feeding the output of one operation as input to the next.
The Core Operators #
There are six fundamental operators in relational algebra:
| Operator | Symbol | Description |
|---|---|---|
| Select | σ | Filters rows that satisfy a predicate |
| Project | π | Selects specific columns, discarding others |
| Union | ∪ | Combines rows from two relations (no duplicates) |
| Intersection | ∩ | Returns rows common to two relations |
| Difference | − | Returns rows in one relation but not another |
| Cartesian Product | × | Combines every row of one relation with every row of another |
Select (σ) #
The select operator filters tuples (rows) that satisfy a given predicate. For example, to select students whose major is Physics:
σmajor = “Physics”(students)
Predicates support comparisons (=, ≠, >, <, ≤, ≥) and logical connectives: ∧ (and), ∨ (or), ¬ (not).
Project (π) #
The project operator selects specific attributes (columns) from a relation, discarding the rest. To get just the names and majors of all students:
πfirst_name, last_name, major(students)
From Algebra to SQL #
These mathematical operators map directly to SQL clauses:
- Select (σ) corresponds to SQL’s
WHEREclause — filtering rows. - Project (π) corresponds to the column list in
SELECT— choosing which columns to return. - Union, Intersection, Difference correspond to SQL’s
UNION,INTERSECT, andEXCEPT. - Cartesian Product (×) corresponds to
CROSS JOIN.
SQL also adds higher-level operations like JOIN (which combines cartesian product with select) and GROUP BY (for aggregation), but the relational algebra operators are the primitives from which everything else is built.
Understanding this foundation helps explain why SQL is the way it is — and why it has remained remarkably stable for over 40 years. The mathematical model underneath gives it precise, unambiguous semantics.
Why the Relational Model Has Endured #
The relational model was proposed in 1970. Over fifty years later, it remains the dominant approach to data management. This longevity is not an accident.
Mathematical foundation. The relational model is grounded in set theory and first-order predicate logic. This gives it precise semantics — there’s no ambiguity about what a query means or what result it should produce.
Data independence. Applications interact with logical tables. The database engine can reorganize the physical storage — add indexes, partition tables across disks, cache frequently accessed data — without requiring any changes to application queries.
Declarative querying. SQL lets you describe what you want, not how to get it. The database’s query optimizer figures out the most efficient execution plan. As your data grows or your hardware changes, the optimizer adapts — often without any changes to your queries.
Composability. The output of a query is a table. That table can be used as input to another query. This composability means that complex operations can be built up from simple, well-understood pieces.
These properties make relational databases predictable, debuggable, and adaptable. When your requirements change — and they will — the relational model gives you the tools to evolve your schema without starting over.