Database Indexing & Query Optimization
📖 Concept
Database indexing is one of the most impactful performance optimizations in system design. A well-placed index can turn a 30-second query into a 5-millisecond one. Understanding how indexes work helps you design schemas that perform well at scale.
How Indexes Work
Without an index, the database performs a full table scan — reading every row to find matches. An index is a separate data structure (usually a B-Tree or B+ Tree) that maintains a sorted reference to rows, enabling fast lookups.
B-Tree Index (Most Common)
| Operation | Without Index | With B-Tree Index |
|---|---|---|
Point lookup (WHERE id = 123) |
O(n) — scan all rows | O(log n) — tree traversal |
Range query (WHERE age > 25) |
O(n) | O(log n + k) where k = results |
| Insert | O(1) | O(log n) — must update index |
| Update indexed column | O(1) | O(log n) — must update index |
Index Types
| Type | Description | Use Case |
|---|---|---|
| B-Tree | Balanced tree, default for most DBs | General purpose, range queries |
| Hash | Direct key → location mapping | Exact-match lookups only |
| GIN (Generalized Inverted) | For multi-valued columns | Full-text search, JSONB, arrays |
| GiST (Generalized Search Tree) | For spatial/geometric data | Geographic queries, nearest-neighbor |
| BRIN (Block Range Index) | For naturally ordered data | Time-series, sequential IDs |
| Bitmap | Bit array per distinct value | Low-cardinality columns (status, gender) |
Index Design Principles
- Index columns you WHERE, JOIN, and ORDER BY on — not every column
- Composite indexes: Put high-cardinality columns first (e.g.,
(user_id, created_at)not(created_at, user_id)) - Covering indexes: Include all columns a query needs so the DB never touches the main table (index-only scan)
- Don't over-index: Each index slows down writes and consumes storage. 3-5 indexes per table is typical.
- Partial indexes: Index only rows that match a condition (e.g.,
WHERE active = true)
The Write-Read Trade-off
Every index makes reads faster but writes slower:
- Read-heavy workload (100:1 read:write): More indexes = better performance
- Write-heavy workload (1:100 read:write): Fewer indexes = better performance
- Balanced: Choose indexes carefully based on your most critical queries
Interview tip: When designing a schema, always mention "I'd add an index on [column] because our primary query pattern is [query]." This shows you think about performance from the start.
💻 Code Example
1// ============================================2// Database Indexing — Practical Examples3// ============================================45// ---------- Index Impact Demonstration ----------67// Table: orders (10 million rows)8// Columns: id, user_id, product_id, amount, status, created_at910// ❌ WITHOUT INDEX: Full table scan11const slowQuery = `12 -- Find all orders for user 12345 in the last 30 days13 -- Without index: scans ALL 10M rows → ~30 seconds14 SELECT * FROM orders15 WHERE user_id = 1234516 AND created_at > NOW() - INTERVAL '30 days'17 ORDER BY created_at DESC;1819 -- EXPLAIN ANALYZE output:20 -- Seq Scan on orders21 -- Filter: (user_id = 12345 AND created_at > '2024-01-01')22 -- Rows Removed by Filter: 9,999,95023 -- Execution Time: 28,543.21 ms ← 28 SECONDS!24`;2526// ✅ WITH COMPOSITE INDEX: B-Tree traversal27const createIndex = `28 -- Create a composite index matching the query pattern29 CREATE INDEX idx_orders_user_date30 ON orders (user_id, created_at DESC);3132 -- Same query WITH index:33 -- Index Scan using idx_orders_user_date on orders34 -- Index Cond: (user_id = 12345 AND created_at > '2024-01-01')35 -- Execution Time: 2.34 ms ← 2 MILLISECONDS! (12,000x faster)36`;3738// ---------- Index Types in Practice ----------3940const indexExamples = `41 -- 1. SINGLE COLUMN INDEX42 -- Good for: Simple WHERE clauses43 CREATE INDEX idx_users_email ON users (email);44 -- Speeds up: SELECT * FROM users WHERE email = 'jane@example.com';4546 -- 2. COMPOSITE (MULTI-COLUMN) INDEX47 -- Good for: Queries filtering on multiple columns48 CREATE INDEX idx_orders_user_status49 ON orders (user_id, status, created_at DESC);50 -- Speeds up: SELECT * FROM orders51 -- WHERE user_id = 123 AND status = 'active'52 -- ORDER BY created_at DESC;53 -- ⚠️ Column order matters! This index does NOT help with:54 -- WHERE status = 'active' (user_id must come first)5556 -- 3. COVERING INDEX (includes all needed columns)57 CREATE INDEX idx_orders_covering58 ON orders (user_id, created_at DESC)59 INCLUDE (amount, status);60 -- The DB never needs to read the main table!61 -- This is called an "index-only scan" — fastest possible6263 -- 4. PARTIAL INDEX64 -- Only index rows matching a condition65 CREATE INDEX idx_orders_active66 ON orders (user_id, created_at DESC)67 WHERE status = 'active';68 -- Smaller index, faster lookups, only for active orders69 -- Great when most queries filter by status = 'active'7071 -- 5. UNIQUE INDEX72 CREATE UNIQUE INDEX idx_users_email_unique ON users (email);73 -- Enforces uniqueness AND provides fast lookups7475 -- 6. GIN INDEX (for JSONB / full-text search)76 CREATE INDEX idx_products_metadata ON products USING GIN (metadata);77 -- Speeds up: SELECT * FROM products78 -- WHERE metadata @> '{"color": "red"}';7980 -- 7. BRIN INDEX (Block Range Index — for time-series)81 CREATE INDEX idx_events_time ON events USING BRIN (created_at);82 -- Very small index, perfect for naturally ordered data83 -- 1000x smaller than B-Tree for time-series tables84`;8586// ---------- Query Optimization Patterns ----------8788// Pattern 1: Use EXPLAIN ANALYZE to understand query performance89const explainQuery = `90 EXPLAIN ANALYZE91 SELECT u.name, COUNT(o.id) as order_count, SUM(o.amount) as total92 FROM users u93 JOIN orders o ON o.user_id = u.id94 WHERE o.created_at > NOW() - INTERVAL '90 days'95 GROUP BY u.id, u.name96 HAVING COUNT(o.id) > 597 ORDER BY total DESC98 LIMIT 20;99100 -- Look for:101 -- ❌ Seq Scan (full table scan — needs index)102 -- ❌ Hash Join on large tables (consider index for join column)103 -- ❌ Sort with high cost (consider index with matching ORDER BY)104 -- ✅ Index Scan or Index Only Scan105 -- ✅ Bitmap Index Scan (for multiple conditions)106`;107108// Pattern 2: Connection pooling for performance109// ❌ BAD: New connection per request (200ms overhead per connection)110async function badQueryHandler(req, res) {111 const conn = await createNewConnection(); // 200ms TCP + TLS + auth112 const result = await conn.query('SELECT * FROM users WHERE id = $1', [req.params.id]);113 await conn.close();114 res.json(result);115}116117// ✅ GOOD: Connection pool (reuse existing connections)118const { Pool } = require('pg');119const pool = new Pool({120 host: 'localhost',121 database: 'mydb',122 max: 20, // Maximum connections in pool123 idleTimeoutMillis: 30000,124 connectionTimeoutMillis: 2000,125});126127async function goodQueryHandler(req, res) {128 // Gets an existing connection from pool (< 1ms)129 const result = await pool.query(130 'SELECT * FROM users WHERE id = $1',131 [req.params.id]132 );133 // Connection automatically returned to pool134 res.json(result.rows[0]);135}136137// Pattern 3: Avoiding N+1 queries138// ❌ BAD: N+1 query problem139async function getPostsWithAuthorsBad() {140 const posts = await db.query('SELECT * FROM posts LIMIT 20'); // 1 query141 for (const post of posts) {142 // N additional queries!143 post.author = await db.query(144 'SELECT name FROM users WHERE id = $1', [post.user_id]145 );146 }147 return posts; // Total: 21 queries!148}149150// ✅ GOOD: JOIN in a single query151async function getPostsWithAuthorsGood() {152 const posts = await db.query(`153 SELECT p.*, u.name as author_name, u.avatar as author_avatar154 FROM posts p155 JOIN users u ON u.id = p.user_id156 ORDER BY p.created_at DESC157 LIMIT 20158 `); // Total: 1 query!159 return posts;160}161162console.log("Indexing examples configured.");
🏋️ Practice Exercise
Index Design Challenge: Given a table
orders(id, user_id, product_id, quantity, total, status, created_at)and these common queries: (a) Get all orders for a user sorted by date, (b) Get all pending orders, (c) Get total revenue per product this month — design the optimal set of indexes. Explain why each index helps.EXPLAIN Analysis: Run
EXPLAIN ANALYZEon a slow query in your database. Identify: (a) what type of scan is used, (b) which step takes the most time, (c) what index would improve it, (d) the expected improvement.Over-Indexing Penalty: A table has 15 indexes and is experiencing slow INSERT performance. Calculate the approximate write amplification (how much extra I/O each insert generates). Design a strategy to reduce to the optimal number of indexes.
Composite Index Order: Explain why
INDEX (user_id, created_at)is different fromINDEX (created_at, user_id). Which one supports "get all orders for user X in date range" more efficiently? Why?Full-Text Search: Design a search system for a product catalog with 10M products. Compare: (a) SQL LIKE queries, (b) PostgreSQL GIN/tsvector, (c) Elasticsearch. Include performance estimates for each.
⚠️ Common Mistakes
Not indexing foreign key columns — JOINs on un-indexed foreign keys force full table scans on the joined table. Always index columns used in JOIN conditions.
Creating indexes on low-cardinality columns — an index on a boolean 'active' column (only 2 values) is rarely useful because the database will scan half the table anyway. Use partial indexes instead.
Wrong column order in composite indexes — in a composite index (A, B, C), queries filtering on just B or just C can't use the index. The leftmost prefix rule means the index supports queries on A, (A,B), or (A,B,C).
Not using EXPLAIN ANALYZE — guessing which queries are slow instead of measuring. Always profile before optimizing. The database's query planner knows more than you do.
Indexing every column 'just in case' — each index costs storage, slows writes, and needs maintenance. Only index columns used in frequent WHERE, JOIN, and ORDER BY clauses.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Database Indexing & Query Optimization