Database Indexing & Query Optimization

0/6 in this phase0/45 across the roadmap

📖 Concept

Database indexing is one of the most impactful performance optimizations in system design. A well-placed index can turn a 30-second query into a 5-millisecond one. Understanding how indexes work helps you design schemas that perform well at scale.

How Indexes Work

Without an index, the database performs a full table scan — reading every row to find matches. An index is a separate data structure (usually a B-Tree or B+ Tree) that maintains a sorted reference to rows, enabling fast lookups.

B-Tree Index (Most Common)

Operation	Without Index	With B-Tree Index
Point lookup (`WHERE id = 123`)	O(n) — scan all rows	O(log n) — tree traversal
Range query (`WHERE age > 25`)	O(n)	O(log n + k) where k = results
Insert	O(1)	O(log n) — must update index
Update indexed column	O(1)	O(log n) — must update index

Index Types

Type	Description	Use Case
B-Tree	Balanced tree, default for most DBs	General purpose, range queries
Hash	Direct key → location mapping	Exact-match lookups only
GIN (Generalized Inverted)	For multi-valued columns	Full-text search, JSONB, arrays
GiST (Generalized Search Tree)	For spatial/geometric data	Geographic queries, nearest-neighbor
BRIN (Block Range Index)	For naturally ordered data	Time-series, sequential IDs
Bitmap	Bit array per distinct value	Low-cardinality columns (status, gender)

Index Design Principles

Index columns you WHERE, JOIN, and ORDER BY on — not every column
Composite indexes: Put high-cardinality columns first (e.g., (user_id, created_at) not (created_at, user_id))
Covering indexes: Include all columns a query needs so the DB never touches the main table (index-only scan)
Don't over-index: Each index slows down writes and consumes storage. 3-5 indexes per table is typical.
Partial indexes: Index only rows that match a condition (e.g., WHERE active = true)

The Write-Read Trade-off

Every index makes reads faster but writes slower:

Read-heavy workload (100:1 read:write): More indexes = better performance
Write-heavy workload (1:100 read:write): Fewer indexes = better performance
Balanced: Choose indexes carefully based on your most critical queries

Interview tip: When designing a schema, always mention "I'd add an index on [column] because our primary query pattern is [query]." This shows you think about performance from the start.

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// Database Indexing — Practical Examples
3// ============================================
4
5// ---------- Index Impact Demonstration ----------
6
7// Table: orders (10 million rows)
8// Columns: id, user_id, product_id, amount, status, created_at
9
10// ❌ WITHOUT INDEX: Full table scan
11const slowQuery = `
12  -- Find all orders for user 12345 in the last 30 days
13  -- Without index: scans ALL 10M rows → ~30 seconds
14  SELECT * FROM orders
15  WHERE user_id = 12345
16  AND created_at > NOW() - INTERVAL '30 days'
17  ORDER BY created_at DESC;
18
19  -- EXPLAIN ANALYZE output:
20  -- Seq Scan on orders
21  --   Filter: (user_id = 12345 AND created_at > '2024-01-01')
22  --   Rows Removed by Filter: 9,999,950
23  --   Execution Time: 28,543.21 ms  ← 28 SECONDS!
24`;
25
26// ✅ WITH COMPOSITE INDEX: B-Tree traversal
27const createIndex = `
28  -- Create a composite index matching the query pattern
29  CREATE INDEX idx_orders_user_date
30  ON orders (user_id, created_at DESC);
31
32  -- Same query WITH index:
33  -- Index Scan using idx_orders_user_date on orders
34  --   Index Cond: (user_id = 12345 AND created_at > '2024-01-01')
35  --   Execution Time: 2.34 ms  ← 2 MILLISECONDS! (12,000x faster)
36`;
37
38// ---------- Index Types in Practice ----------
39
40const indexExamples = `
41  -- 1. SINGLE COLUMN INDEX
42  -- Good for: Simple WHERE clauses
43  CREATE INDEX idx_users_email ON users (email);
44  -- Speeds up: SELECT * FROM users WHERE email = 'jane@example.com';
45
46  -- 2. COMPOSITE (MULTI-COLUMN) INDEX
47  -- Good for: Queries filtering on multiple columns
48  CREATE INDEX idx_orders_user_status
49  ON orders (user_id, status, created_at DESC);
50  -- Speeds up: SELECT * FROM orders
51  --            WHERE user_id = 123 AND status = 'active'
52  --            ORDER BY created_at DESC;
53  -- ⚠️ Column order matters! This index does NOT help with:
54  -- WHERE status = 'active' (user_id must come first)
55
56  -- 3. COVERING INDEX (includes all needed columns)
57  CREATE INDEX idx_orders_covering
58  ON orders (user_id, created_at DESC)
59  INCLUDE (amount, status);
60  -- The DB never needs to read the main table!
61  -- This is called an "index-only scan" — fastest possible
62
63  -- 4. PARTIAL INDEX
64  -- Only index rows matching a condition
65  CREATE INDEX idx_orders_active
66  ON orders (user_id, created_at DESC)
67  WHERE status = 'active';
68  -- Smaller index, faster lookups, only for active orders
69  -- Great when most queries filter by status = 'active'
70
71  -- 5. UNIQUE INDEX
72  CREATE UNIQUE INDEX idx_users_email_unique ON users (email);
73  -- Enforces uniqueness AND provides fast lookups
74
75  -- 6. GIN INDEX (for JSONB / full-text search)
76  CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
77  -- Speeds up: SELECT * FROM products
78  --            WHERE metadata @> '{"color": "red"}';
79
80  -- 7. BRIN INDEX (Block Range Index — for time-series)
81  CREATE INDEX idx_events_time ON events USING BRIN (created_at);
82  -- Very small index, perfect for naturally ordered data
83  -- 1000x smaller than B-Tree for time-series tables
84`;
85
86// ---------- Query Optimization Patterns ----------
87
88// Pattern 1: Use EXPLAIN ANALYZE to understand query performance
89const explainQuery = `
90  EXPLAIN ANALYZE
91  SELECT u.name, COUNT(o.id) as order_count, SUM(o.amount) as total
92  FROM users u
93  JOIN orders o ON o.user_id = u.id
94  WHERE o.created_at > NOW() - INTERVAL '90 days'
95  GROUP BY u.id, u.name
96  HAVING COUNT(o.id) > 5
97  ORDER BY total DESC
98  LIMIT 20;
99
100  -- Look for:
101  -- ❌ Seq Scan (full table scan — needs index)
102  -- ❌ Hash Join on large tables (consider index for join column)
103  -- ❌ Sort with high cost (consider index with matching ORDER BY)
104  -- ✅ Index Scan or Index Only Scan
105  -- ✅ Bitmap Index Scan (for multiple conditions)
106`;
107
108// Pattern 2: Connection pooling for performance
109// ❌ BAD: New connection per request (200ms overhead per connection)
110async function badQueryHandler(req, res) {
111  const conn = await createNewConnection(); // 200ms TCP + TLS + auth
112  const result = await conn.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
113  await conn.close();
114  res.json(result);
115}
116
117// ✅ GOOD: Connection pool (reuse existing connections)
118const { Pool } = require('pg');
119const pool = new Pool({
120  host: 'localhost',
121  database: 'mydb',
122  max: 20,                // Maximum connections in pool
123  idleTimeoutMillis: 30000,
124  connectionTimeoutMillis: 2000,
125});
126
127async function goodQueryHandler(req, res) {
128  // Gets an existing connection from pool (< 1ms)
129  const result = await pool.query(
130    'SELECT * FROM users WHERE id = $1',
131    [req.params.id]
132  );
133  // Connection automatically returned to pool
134  res.json(result.rows[0]);
135}
136
137// Pattern 3: Avoiding N+1 queries
138// ❌ BAD: N+1 query problem
139async function getPostsWithAuthorsBad() {
140  const posts = await db.query('SELECT * FROM posts LIMIT 20'); // 1 query
141  for (const post of posts) {
142    // N additional queries!
143    post.author = await db.query(
144      'SELECT name FROM users WHERE id = $1', [post.user_id]
145    );
146  }
147  return posts; // Total: 21 queries!
148}
149
150// ✅ GOOD: JOIN in a single query
151async function getPostsWithAuthorsGood() {
152  const posts = await db.query(`
153    SELECT p.*, u.name as author_name, u.avatar as author_avatar
154    FROM posts p
155    JOIN users u ON u.id = p.user_id
156    ORDER BY p.created_at DESC
157    LIMIT 20
158  `); // Total: 1 query!
159  return posts;
160}
161
162console.log("Indexing examples configured.");

🏋️ Practice Exercise

Index Design Challenge: Given a table orders(id, user_id, product_id, quantity, total, status, created_at) and these common queries: (a) Get all orders for a user sorted by date, (b) Get all pending orders, (c) Get total revenue per product this month — design the optimal set of indexes. Explain why each index helps.
EXPLAIN Analysis: Run EXPLAIN ANALYZE on a slow query in your database. Identify: (a) what type of scan is used, (b) which step takes the most time, (c) what index would improve it, (d) the expected improvement.
Over-Indexing Penalty: A table has 15 indexes and is experiencing slow INSERT performance. Calculate the approximate write amplification (how much extra I/O each insert generates). Design a strategy to reduce to the optimal number of indexes.
Composite Index Order: Explain why INDEX (user_id, created_at) is different from INDEX (created_at, user_id). Which one supports "get all orders for user X in date range" more efficiently? Why?
Full-Text Search: Design a search system for a product catalog with 10M products. Compare: (a) SQL LIKE queries, (b) PostgreSQL GIN/tsvector, (c) Elasticsearch. Include performance estimates for each.

⚠️ Common Mistakes

Not indexing foreign key columns — JOINs on un-indexed foreign keys force full table scans on the joined table. Always index columns used in JOIN conditions.
Creating indexes on low-cardinality columns — an index on a boolean 'active' column (only 2 values) is rarely useful because the database will scan half the table anyway. Use partial indexes instead.
Wrong column order in composite indexes — in a composite index (A, B, C), queries filtering on just B or just C can't use the index. The leftmost prefix rule means the index supports queries on A, (A,B), or (A,B,C).
Not using EXPLAIN ANALYZE — guessing which queries are slow instead of measuring. Always profile before optimizing. The database's query planner knows more than you do.
Indexing every column 'just in case' — each index costs storage, slows writes, and needs maintenance. Only index columns used in frequent WHERE, JOIN, and ORDER BY clauses.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Database Indexing & Query Optimization

Was this topic helpful?

← PreviousSQL vs NoSQL Databases Next →Database Sharding & Partitioning