Database Indexing & Query Optimization

0/6 in this phase0/45 across the roadmap

📖 Concept

Database indexing is one of the most impactful performance optimizations in system design. A well-placed index can turn a 30-second query into a 5-millisecond one. Understanding how indexes work helps you design schemas that perform well at scale.

How Indexes Work

Without an index, the database performs a full table scan — reading every row to find matches. An index is a separate data structure (usually a B-Tree or B+ Tree) that maintains a sorted reference to rows, enabling fast lookups.

B-Tree Index (Most Common)

Operation Without Index With B-Tree Index
Point lookup (WHERE id = 123) O(n) — scan all rows O(log n) — tree traversal
Range query (WHERE age > 25) O(n) O(log n + k) where k = results
Insert O(1) O(log n) — must update index
Update indexed column O(1) O(log n) — must update index

Index Types

Type Description Use Case
B-Tree Balanced tree, default for most DBs General purpose, range queries
Hash Direct key → location mapping Exact-match lookups only
GIN (Generalized Inverted) For multi-valued columns Full-text search, JSONB, arrays
GiST (Generalized Search Tree) For spatial/geometric data Geographic queries, nearest-neighbor
BRIN (Block Range Index) For naturally ordered data Time-series, sequential IDs
Bitmap Bit array per distinct value Low-cardinality columns (status, gender)

Index Design Principles

  1. Index columns you WHERE, JOIN, and ORDER BY on — not every column
  2. Composite indexes: Put high-cardinality columns first (e.g., (user_id, created_at) not (created_at, user_id))
  3. Covering indexes: Include all columns a query needs so the DB never touches the main table (index-only scan)
  4. Don't over-index: Each index slows down writes and consumes storage. 3-5 indexes per table is typical.
  5. Partial indexes: Index only rows that match a condition (e.g., WHERE active = true)

The Write-Read Trade-off

Every index makes reads faster but writes slower:

  • Read-heavy workload (100:1 read:write): More indexes = better performance
  • Write-heavy workload (1:100 read:write): Fewer indexes = better performance
  • Balanced: Choose indexes carefully based on your most critical queries

Interview tip: When designing a schema, always mention "I'd add an index on [column] because our primary query pattern is [query]." This shows you think about performance from the start.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Database Indexing — Practical Examples
3// ============================================
4
5// ---------- Index Impact Demonstration ----------
6
7// Table: orders (10 million rows)
8// Columns: id, user_id, product_id, amount, status, created_at
9
10// ❌ WITHOUT INDEX: Full table scan
11const slowQuery = `
12 -- Find all orders for user 12345 in the last 30 days
13 -- Without index: scans ALL 10M rows → ~30 seconds
14 SELECT * FROM orders
15 WHERE user_id = 12345
16 AND created_at > NOW() - INTERVAL '30 days'
17 ORDER BY created_at DESC;
18
19 -- EXPLAIN ANALYZE output:
20 -- Seq Scan on orders
21 -- Filter: (user_id = 12345 AND created_at > '2024-01-01')
22 -- Rows Removed by Filter: 9,999,950
23 -- Execution Time: 28,543.21 ms ← 28 SECONDS!
24`;
25
26// ✅ WITH COMPOSITE INDEX: B-Tree traversal
27const createIndex = `
28 -- Create a composite index matching the query pattern
29 CREATE INDEX idx_orders_user_date
30 ON orders (user_id, created_at DESC);
31
32 -- Same query WITH index:
33 -- Index Scan using idx_orders_user_date on orders
34 -- Index Cond: (user_id = 12345 AND created_at > '2024-01-01')
35 -- Execution Time: 2.34 ms ← 2 MILLISECONDS! (12,000x faster)
36`;
37
38// ---------- Index Types in Practice ----------
39
40const indexExamples = `
41 -- 1. SINGLE COLUMN INDEX
42 -- Good for: Simple WHERE clauses
43 CREATE INDEX idx_users_email ON users (email);
44 -- Speeds up: SELECT * FROM users WHERE email = 'jane@example.com';
45
46 -- 2. COMPOSITE (MULTI-COLUMN) INDEX
47 -- Good for: Queries filtering on multiple columns
48 CREATE INDEX idx_orders_user_status
49 ON orders (user_id, status, created_at DESC);
50 -- Speeds up: SELECT * FROM orders
51 -- WHERE user_id = 123 AND status = 'active'
52 -- ORDER BY created_at DESC;
53 -- ⚠️ Column order matters! This index does NOT help with:
54 -- WHERE status = 'active' (user_id must come first)
55
56 -- 3. COVERING INDEX (includes all needed columns)
57 CREATE INDEX idx_orders_covering
58 ON orders (user_id, created_at DESC)
59 INCLUDE (amount, status);
60 -- The DB never needs to read the main table!
61 -- This is called an "index-only scan" — fastest possible
62
63 -- 4. PARTIAL INDEX
64 -- Only index rows matching a condition
65 CREATE INDEX idx_orders_active
66 ON orders (user_id, created_at DESC)
67 WHERE status = 'active';
68 -- Smaller index, faster lookups, only for active orders
69 -- Great when most queries filter by status = 'active'
70
71 -- 5. UNIQUE INDEX
72 CREATE UNIQUE INDEX idx_users_email_unique ON users (email);
73 -- Enforces uniqueness AND provides fast lookups
74
75 -- 6. GIN INDEX (for JSONB / full-text search)
76 CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
77 -- Speeds up: SELECT * FROM products
78 -- WHERE metadata @> '{"color": "red"}';
79
80 -- 7. BRIN INDEX (Block Range Index — for time-series)
81 CREATE INDEX idx_events_time ON events USING BRIN (created_at);
82 -- Very small index, perfect for naturally ordered data
83 -- 1000x smaller than B-Tree for time-series tables
84`;
85
86// ---------- Query Optimization Patterns ----------
87
88// Pattern 1: Use EXPLAIN ANALYZE to understand query performance
89const explainQuery = `
90 EXPLAIN ANALYZE
91 SELECT u.name, COUNT(o.id) as order_count, SUM(o.amount) as total
92 FROM users u
93 JOIN orders o ON o.user_id = u.id
94 WHERE o.created_at > NOW() - INTERVAL '90 days'
95 GROUP BY u.id, u.name
96 HAVING COUNT(o.id) > 5
97 ORDER BY total DESC
98 LIMIT 20;
99
100 -- Look for:
101 -- ❌ Seq Scan (full table scan — needs index)
102 -- ❌ Hash Join on large tables (consider index for join column)
103 -- ❌ Sort with high cost (consider index with matching ORDER BY)
104 -- ✅ Index Scan or Index Only Scan
105 -- ✅ Bitmap Index Scan (for multiple conditions)
106`;
107
108// Pattern 2: Connection pooling for performance
109// ❌ BAD: New connection per request (200ms overhead per connection)
110async function badQueryHandler(req, res) {
111 const conn = await createNewConnection(); // 200ms TCP + TLS + auth
112 const result = await conn.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
113 await conn.close();
114 res.json(result);
115}
116
117// ✅ GOOD: Connection pool (reuse existing connections)
118const { Pool } = require('pg');
119const pool = new Pool({
120 host: 'localhost',
121 database: 'mydb',
122 max: 20, // Maximum connections in pool
123 idleTimeoutMillis: 30000,
124 connectionTimeoutMillis: 2000,
125});
126
127async function goodQueryHandler(req, res) {
128 // Gets an existing connection from pool (< 1ms)
129 const result = await pool.query(
130 'SELECT * FROM users WHERE id = $1',
131 [req.params.id]
132 );
133 // Connection automatically returned to pool
134 res.json(result.rows[0]);
135}
136
137// Pattern 3: Avoiding N+1 queries
138// ❌ BAD: N+1 query problem
139async function getPostsWithAuthorsBad() {
140 const posts = await db.query('SELECT * FROM posts LIMIT 20'); // 1 query
141 for (const post of posts) {
142 // N additional queries!
143 post.author = await db.query(
144 'SELECT name FROM users WHERE id = $1', [post.user_id]
145 );
146 }
147 return posts; // Total: 21 queries!
148}
149
150// ✅ GOOD: JOIN in a single query
151async function getPostsWithAuthorsGood() {
152 const posts = await db.query(`
153 SELECT p.*, u.name as author_name, u.avatar as author_avatar
154 FROM posts p
155 JOIN users u ON u.id = p.user_id
156 ORDER BY p.created_at DESC
157 LIMIT 20
158 `); // Total: 1 query!
159 return posts;
160}
161
162console.log("Indexing examples configured.");

🏋️ Practice Exercise

  1. Index Design Challenge: Given a table orders(id, user_id, product_id, quantity, total, status, created_at) and these common queries: (a) Get all orders for a user sorted by date, (b) Get all pending orders, (c) Get total revenue per product this month — design the optimal set of indexes. Explain why each index helps.

  2. EXPLAIN Analysis: Run EXPLAIN ANALYZE on a slow query in your database. Identify: (a) what type of scan is used, (b) which step takes the most time, (c) what index would improve it, (d) the expected improvement.

  3. Over-Indexing Penalty: A table has 15 indexes and is experiencing slow INSERT performance. Calculate the approximate write amplification (how much extra I/O each insert generates). Design a strategy to reduce to the optimal number of indexes.

  4. Composite Index Order: Explain why INDEX (user_id, created_at) is different from INDEX (created_at, user_id). Which one supports "get all orders for user X in date range" more efficiently? Why?

  5. Full-Text Search: Design a search system for a product catalog with 10M products. Compare: (a) SQL LIKE queries, (b) PostgreSQL GIN/tsvector, (c) Elasticsearch. Include performance estimates for each.

⚠️ Common Mistakes

  • Not indexing foreign key columns — JOINs on un-indexed foreign keys force full table scans on the joined table. Always index columns used in JOIN conditions.

  • Creating indexes on low-cardinality columns — an index on a boolean 'active' column (only 2 values) is rarely useful because the database will scan half the table anyway. Use partial indexes instead.

  • Wrong column order in composite indexes — in a composite index (A, B, C), queries filtering on just B or just C can't use the index. The leftmost prefix rule means the index supports queries on A, (A,B), or (A,B,C).

  • Not using EXPLAIN ANALYZE — guessing which queries are slow instead of measuring. Always profile before optimizing. The database's query planner knows more than you do.

  • Indexing every column 'just in case' — each index costs storage, slows writes, and needs maintenance. Only index columns used in frequent WHERE, JOIN, and ORDER BY clauses.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Database Indexing & Query Optimization