SQL Fundamentals — Querying, Filtering, and Joining Data

SQL Fundamentals #

What SQL Is (and Isn’t) #

SQL (Structured Query Language) is the standard language for interacting with relational databases. Virtually every relational database — PostgreSQL, MySQL, SQLite, SQL Server, Oracle — speaks SQL.

SQL is a declarative language. You describe what result you want, not how to compute it. When you write a query, the database engine decides the best execution plan — which indexes to use, in what order to scan tables, how to join data. You focus on the logic; the engine handles the mechanics.

SQL is not a general-purpose programming language. You can’t write a web server or a command-line tool in SQL. It’s purpose-built for four categories of operations:

  • DDL (Data Definition Language): Creating and modifying tables and schemas — CREATE, ALTER, DROP.
  • DML (Data Manipulation Language): Reading and modifying data — SELECT, INSERT, UPDATE, DELETE.
  • DCL (Data Control Language): Managing permissions — GRANT, REVOKE.
  • TCL (Transaction Control Language): Managing transactions — BEGIN, COMMIT, ROLLBACK.

This module focuses on DDL and DML — defining tables and working with data.


Creating Tables #

Before you can store data, you need to define the structure. Here’s how you’d create the tables from the previous module:

CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT NOT NULL UNIQUE
);

CREATE TABLE orders (
    order_id    INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    product     TEXT NOT NULL,
    quantity    INTEGER NOT NULL CHECK (quantity > 0),
    price       REAL NOT NULL CHECK (price > 0),
    order_date  DATE NOT NULL DEFAULT CURRENT_DATE,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Notice how constraints are declared inline with the table definition. The database will enforce these rules for every row, regardless of which application is writing the data.


Inserting Data #

INSERT INTO customers (customer_id, name, email)
VALUES (1, 'Alice', 'alice@example.com');

INSERT INTO customers (customer_id, name, email)
VALUES (2, 'Bob', 'bob@example.com');

INSERT INTO customers (customer_id, name, email)
VALUES (3, 'Charlie', 'charlie@example.com');

You can also insert multiple rows in a single statement:

INSERT INTO orders (order_id, customer_id, product, quantity, price, order_date)
VALUES
    (101, 1, 'Widget',  3, 9.99,  '2025-03-15'),
    (102, 2, 'Gadget',  1, 24.99, '2025-03-16'),
    (103, 1, 'Gizmo',   2, 14.99, '2025-03-17'),
    (104, 3, 'Widget',  5, 9.99,  '2025-03-18'),
    (105, 1, 'Gadget',  1, 24.99, '2025-03-19');

If you try to insert an order with a customer_id that doesn’t exist in the customers table, the database will reject it — that’s the foreign key constraint at work.


Querying Data with SELECT #

The SELECT statement is the workhorse of SQL. At its simplest:

SELECT * FROM customers;

This returns every column and every row from the customers table. In practice, you’ll almost always want to be more specific.

Selecting Specific Columns #

SELECT name, email FROM customers;

Only returns the columns you ask for. This matters for performance — especially with wide tables that have many columns.

Filtering with WHERE #

SELECT * FROM orders
WHERE price > 10;

Returns only the orders where the price exceeds 10. The WHERE clause supports the comparison operators you’d expect: =, != (or <>), <, >, <=, >=.

You can combine conditions with AND and OR:

SELECT * FROM orders
WHERE price > 10 AND quantity >= 2;

Other useful WHERE operators:

-- Pattern matching
SELECT * FROM customers WHERE name LIKE 'A%';

-- Checking for NULL
SELECT * FROM orders WHERE order_date IS NOT NULL;

-- Membership
SELECT * FROM orders WHERE product IN ('Widget', 'Gadget');

-- Range
SELECT * FROM orders WHERE price BETWEEN 10 AND 30;

Sorting with ORDER BY #

SELECT * FROM orders
ORDER BY order_date DESC;

Results are returned in descending date order. Use ASC (the default) for ascending.

You can sort by multiple columns:

SELECT * FROM orders
ORDER BY customer_id ASC, order_date DESC;

Limiting Results #

SELECT * FROM orders
ORDER BY price DESC
LIMIT 3;

Returns only the three most expensive orders. LIMIT is essential for pagination and for quickly inspecting large tables.


Updating Data #

UPDATE customers
SET email = 'alice.new@example.com'
WHERE customer_id = 1;

The WHERE clause is critical. Without it, the UPDATE affects every row in the table — a common and painful mistake.

You can update multiple columns at once:

UPDATE orders
SET quantity = 4, price = 8.99
WHERE order_id = 101;

Deleting Data #

DELETE FROM orders
WHERE order_id = 102;

Again, the WHERE clause determines what gets deleted. DELETE FROM orders without a WHERE clause deletes every row in the table.


Joins: Combining Data from Multiple Tables #

The real power of relational databases emerges when you combine data from related tables. This is what joins do.

INNER JOIN #

Returns rows where there’s a match in both tables:

SELECT customers.name, orders.product, orders.quantity, orders.price
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;

Result:

name product quantity price
Alice Widget 3 9.99
Bob Gadget 1 24.99
Alice Gizmo 2 14.99
Charlie Widget 5 9.99
Alice Gadget 1 24.99

The join replaced customer_id with the actual customer name. No data was duplicated in storage — the relationship was reconstructed at query time.

LEFT JOIN #

Returns all rows from the left table, even if there’s no match in the right table:

SELECT customers.name, orders.product
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;

If a customer has no orders, they still appear in the result, with NULL for the order columns. This is useful for finding “orphaned” records — customers who haven’t ordered, products that haven’t been sold, etc.

Table Aliases #

For readability, you can give tables shorter aliases:

SELECT c.name, o.product, o.quantity
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.price > 10;

Aggregation: Summarizing Data #

Aggregate functions compute a single value from a set of rows.

Common Aggregate Functions #

-- Count all orders
SELECT COUNT(*) FROM orders;

-- Total revenue
SELECT SUM(price * quantity) AS total_revenue FROM orders;

-- Average order value
SELECT AVG(price * quantity) AS avg_order_value FROM orders;

-- Most expensive single item
SELECT MAX(price) FROM orders;

-- Least expensive single item
SELECT MIN(price) FROM orders;

GROUP BY #

Aggregation becomes powerful when combined with GROUP BY, which partitions rows into groups:

SELECT c.name, COUNT(*) AS order_count, SUM(o.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name;

Result:

name order_count total_spent
Alice 3 84.94
Bob 1 24.99
Charlie 1 49.95

HAVING: Filtering Groups #

WHERE filters individual rows before grouping. HAVING filters groups after aggregation:

SELECT c.name, SUM(o.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name
HAVING total_spent > 30;

This returns only customers who have spent more than $30 total.


Altering Tables #

Schemas evolve. You can modify existing tables without losing data:

-- Add a column
ALTER TABLE customers ADD COLUMN phone TEXT;

-- Remove a column
ALTER TABLE customers DROP COLUMN phone;

-- Rename a table
ALTER TABLE orders RENAME TO customer_orders;

The specifics vary between database systems — PostgreSQL, MySQL, and SQLite each support slightly different ALTER TABLE capabilities.


The Order of a SQL Query #

A common source of confusion is the order in which SQL clauses are processed. You write them in one order, but the database evaluates them in another:

Evaluation order Clause Purpose
1 FROM Which tables to read
2 JOIN How to combine tables
3 WHERE Filter individual rows
4 GROUP BY Partition into groups
5 HAVING Filter groups
6 SELECT Choose which columns to return
7 ORDER BY Sort the result
8 LIMIT Restrict how many rows to return

This explains, for example, why you can’t use a column alias from SELECT in a WHERE clause — WHERE is evaluated before SELECT.

Understanding this evaluation order makes debugging complex queries much easier.