Beyond Relational #
Relational Databases Are Not Always the Answer #
The relational model is remarkably general. For most applications, a well-designed relational database handles everything you need — transactions, complex queries, data integrity, and reasonable performance.
But “most” is not “all.” Some workloads push against the boundaries of what relational databases handle well. Understanding where those boundaries are — and what alternatives exist — is essential for making informed architectural decisions.
This is not about choosing sides. The question is never “relational or NoSQL?” It’s: what are the specific requirements of this workload, and which tool fits them best?
Where Relational Databases Struggle #
Horizontal Scaling #
Relational databases are designed to run on a single server. As data grows, you can add more RAM, faster disks, and more CPU cores — this is vertical scaling. But there’s a ceiling to how far vertical scaling goes.
Horizontal scaling — distributing data across many machines — is much harder with relational databases. Joins across machines are expensive. Distributed transactions are complex and slow. Maintaining foreign key constraints across nodes is, in the general case, not practical.
If your application generates terabytes of data per day and needs to serve millions of concurrent reads, vertical scaling may not be enough.
Unstructured and Semi-structured Data #
Relational databases require a fixed schema. Every row in a table has the same columns. This is a strength when your data is well-understood and consistent, but a constraint when it’s not.
Consider a product catalog where different product categories have different attributes. A book has an ISBN, author, and page count. A laptop has a processor, RAM, and screen size. A t-shirt has a size and color. In a relational schema, you’d need either a very wide table with many nullable columns, or a complex set of related tables.
High-throughput, Low-latency Workloads #
Some applications need to read or write millions of small records per second with sub-millisecond latency — session stores, real-time analytics, caching layers. Relational databases can be tuned for this, but purpose-built systems often handle it more efficiently.
Types of NoSQL Databases #
“NoSQL” is an umbrella term covering several distinct database architectures, each designed for different access patterns.
Key-Value Stores #
The simplest model: every record is a key-value pair. You look up values by their key. No queries on values, no relationships, no schema.
Examples: Redis, Amazon DynamoDB, etcd
Good for:
- Session storage
- Caching (frequently accessed data that’s expensive to compute)
- Configuration and feature flags
- Rate limiting and counters
Trade-off: You can only access data by its key. If you need to search by value (“find all sessions for user X”), you either need a secondary index (which some key-value stores support) or a different database.
Redis, the most widely used key-value store, keeps all data in memory. This makes it extremely fast but limits the total dataset size to available RAM (unless using Redis’s disk-based persistence or clustering).
Document Stores #
A document store organizes data into documents — self-contained, semi-structured records, typically stored as JSON (or a binary equivalent like BSON).
Examples: MongoDB, CouchDB, Amazon DocumentDB
A document for a customer order might look like:
{
"_id": "order-103",
"customer": {
"name": "Alice",
"email": "alice@example.com"
},
"items": [
{"product": "Widget", "quantity": 3, "price": 9.99},
{"product": "Gizmo", "quantity": 2, "price": 14.99}
],
"order_date": "2025-03-17",
"total": 59.95
}
Notice that the customer information and order items are embedded in the same document, rather than split across separate tables. This is called denormalization — the same design we warned against in the normalization module.
Good for:
- Applications where data is naturally hierarchical (e.g., a blog post with comments and tags)
- Rapid prototyping where the schema is still evolving
- Content management systems
- Catalogs with varied attributes per item
Trade-off: Embedded data gets duplicated. If Alice changes her email, every document containing her information needs to be updated. Document stores give you flexibility at the cost of the integrity guarantees that relational databases enforce automatically.
Column-Family Stores #
Column-family stores organize data into rows and columns, but unlike relational databases, different rows can have different columns. Data is stored and retrieved by column families — groups of related columns that are frequently accessed together.
Examples: Apache Cassandra, Apache HBase, Google Bigtable
Good for:
- Time-series data (sensor readings, logs, metrics)
- Write-heavy workloads at massive scale
- Data that’s naturally partitioned (by time, by user, by region)
Trade-off: No joins. No multi-row transactions (in most implementations). Query patterns must be designed in advance — you model your tables around how you’ll query the data, not around the logical relationships in the data.
Graph Databases #
Graph databases store data as nodes (entities) and edges (relationships). Unlike relational databases, where relationships are computed at query time via joins, graph databases store relationships as first-class objects — making traversal along relationships extremely fast.
Examples: Neo4j, Amazon Neptune, ArangoDB
Good for:
- Social networks (friends of friends, mutual connections)
- Recommendation engines (users who liked X also liked Y)
- Fraud detection (finding suspicious transaction patterns)
- Knowledge graphs and ontologies
Trade-off: Graph databases excel at relationship-heavy queries but are not designed for bulk aggregation or reporting. Asking “what’s the average purchase amount across all users?” is straightforward in SQL but awkward in a graph query language.
The CAP Theorem #
When data is distributed across multiple machines, the CAP theorem (proposed by Eric Brewer in 2000) states that a distributed system can provide at most two of three guarantees simultaneously:
- Consistency: Every read returns the most recent write. All nodes see the same data at the same time.
- Availability: Every request receives a response, even if some nodes are down.
- Partition tolerance: The system continues to operate even when network communication between nodes is interrupted.
In a distributed system, network partitions will happen — this is a reality of running on multiple machines. So the practical choice is between:
- CP (Consistency + Partition tolerance): The system may become unavailable during a partition, but when it responds, the data is guaranteed to be correct. Traditional relational databases with distributed setups tend toward CP.
- AP (Availability + Partition tolerance): The system always responds, but different nodes may temporarily return different (stale) data. Many NoSQL databases (Cassandra, DynamoDB) tend toward AP.
The CAP theorem is not a menu where you pick two items. It’s a description of a fundamental constraint. In practice, most systems make nuanced trade-offs along a spectrum rather than picking a strict CP or AP mode.
Making the Choice #
For most applications, the decision process is straightforward:
Start with a relational database (PostgreSQL is an excellent default). Relational databases handle a remarkably wide range of workloads. They’re well-understood, well-documented, and well-supported. SQL is a transferable skill. ACID transactions simplify application logic.
Reach for a specialized database when you have a specific need:
| If you need… | Consider… |
|---|---|
| Sub-millisecond caching or session storage | Redis |
| Flexible schema for varied document types | MongoDB |
| Massive write throughput with time-series data | Cassandra |
| Relationship-heavy traversal queries | Neo4j |
| Simple key-value storage at scale | DynamoDB |
Polyglot persistence — using multiple databases in one application — is common in production systems. A typical architecture might use PostgreSQL for core business data, Redis for caching and sessions, and Elasticsearch for full-text search. Each database handles what it’s best at.
The important thing is to choose based on actual requirements, not hype. A relational database you understand well is almost always better than a NoSQL database you chose because a blog post said it was faster.
What We’ve Covered in This Course #
Over eight modules, we’ve built a foundation for understanding databases:
- Why not flat files — the problems with file-based storage and why they’re fundamental.
- What is a DBMS — the difference between a database and a database management system, and what a DBMS provides.
- History of database systems — from navigational databases to Codd’s relational revolution to NoSQL and vector databases.
- The relational model — tables, keys, relationships, relational algebra, and constraints.
- SQL fundamentals — querying, joining, filtering, and aggregating data.
- Schema design — normalization, anomalies, and when to denormalize.
- Indexes and performance — how databases find data fast, and how to measure it.
- Beyond relational — NoSQL databases, the CAP theorem, and choosing the right tool.
These concepts are not tied to any specific database product. Whether you use PostgreSQL, MySQL, MongoDB, or something else entirely, the principles of data modeling, query design, and understanding trade-offs apply universally.