SQL vs NoSQL Databases

0/6 in this phase0/45 across the roadmap

📖 Concept

One of the most critical decisions in system design is choosing between SQL (relational) and NoSQL (non-relational) databases. There's no universally "better" option — the choice depends entirely on your data model, access patterns, and scale requirements.

SQL (Relational) Databases

SQL databases store data in tables with rows and columns, connected through foreign keys and queried with SQL (Structured Query Language).

Examples: PostgreSQL, MySQL, SQLite, Oracle, SQL Server

Strengths:

  • ACID transactions — Atomicity, Consistency, Isolation, Durability
  • Complex queries — JOINs, subqueries, aggregations, window functions
  • Data integrity — Foreign keys, constraints, triggers
  • Mature ecosystem — 40+ years of optimization, tooling, and expertise

Weaknesses:

  • Vertical scaling is the primary option (bigger server)
  • Rigid schema — schema changes on large tables can be slow/risky
  • Horizontal scaling is possible but complex (sharding)

NoSQL Database Types

Type Data Model Examples Best For
Document JSON-like documents MongoDB, CouchDB Flexible schemas, content management
Key-Value Simple key → value pairs Redis, DynamoDB Caching, session storage, high-speed lookups
Wide-Column Row key → columns (grouped in families) Cassandra, HBase Time series, IoT, write-heavy workloads
Graph Nodes and edges Neo4j, Amazon Neptune Social networks, recommendations, fraud detection

Decision Framework

Criteria Choose SQL Choose NoSQL
Data relationships Complex, many-to-many Simple, denormalized
Schema Well-defined, stable Evolving, flexible
Transactions ACID required Eventual consistency OK
Scale pattern Moderate scale, complex queries Massive scale, simple queries
Access pattern Ad-hoc queries, reports Known access patterns, key-based lookups
Examples Banking, ERP, inventory Social feeds, IoT, content, real-time analytics

The Polyglot Persistence Approach

Modern systems often use multiple databases, each optimized for its task:

  • PostgreSQL for user accounts and transactions (ACID needed)
  • Redis for caching and sessions (speed needed)
  • Elasticsearch for full-text search (search functionality)
  • Cassandra for event logs and time-series data (write-heavy)
  • Neo4j for social graph queries (relationship traversal)

Pro tip: In interviews, saying "I'd use PostgreSQL for X because we need ACID transactions, and Cassandra for Y because it's write-heavy with 100K events/sec" shows much more maturity than picking one database for everything.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// SQL vs NoSQL — Practical Comparison
3// ============================================
4
5// ---------- SQL: Relational Data Model ----------
6
7// Schema definition (PostgreSQL)
8const sqlSchema = `
9 -- Users table
10 CREATE TABLE users (
11 id SERIAL PRIMARY KEY,
12 name VARCHAR(100) NOT NULL,
13 email VARCHAR(255) UNIQUE NOT NULL,
14 created_at TIMESTAMP DEFAULT NOW()
15 );
16
17 -- Posts table (related to users)
18 CREATE TABLE posts (
19 id SERIAL PRIMARY KEY,
20 user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
21 title VARCHAR(200) NOT NULL,
22 content TEXT NOT NULL,
23 published BOOLEAN DEFAULT false,
24 created_at TIMESTAMP DEFAULT NOW()
25 );
26
27 -- Tags (many-to-many with posts)
28 CREATE TABLE tags (
29 id SERIAL PRIMARY KEY,
30 name VARCHAR(50) UNIQUE NOT NULL
31 );
32
33 CREATE TABLE post_tags (
34 post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
35 tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
36 PRIMARY KEY (post_id, tag_id)
37 );
38
39 -- Index for common queries
40 CREATE INDEX idx_posts_user_id ON posts(user_id);
41 CREATE INDEX idx_posts_created_at ON posts(created_at DESC);
42`;
43
44// ✅ SQL excels at complex queries with JOINs
45const complexQuery = `
46 -- "Get top 10 most active users with their post count and latest post"
47 SELECT
48 u.id,
49 u.name,
50 COUNT(p.id) as post_count,
51 MAX(p.created_at) as latest_post_at,
52 ARRAY_AGG(DISTINCT t.name) as tags_used
53 FROM users u
54 JOIN posts p ON p.user_id = u.id
55 LEFT JOIN post_tags pt ON pt.post_id = p.id
56 LEFT JOIN tags t ON t.id = pt.tag_id
57 WHERE p.published = true
58 GROUP BY u.id, u.name
59 ORDER BY post_count DESC
60 LIMIT 10;
61`;
62
63// ---------- NoSQL: Document Model (MongoDB) ----------
64
65// Same data in MongoDB — single document, no JOINs needed
66const mongoDocument = {
67 _id: "user_123",
68 name: "Jane Doe",
69 email: "jane@example.com",
70 createdAt: new Date("2024-01-15"),
71 posts: [
72 {
73 _id: "post_456",
74 title: "My First Post",
75 content: "Hello world!",
76 published: true,
77 tags: ["javascript", "tutorial"], // Embedded, no separate table
78 createdAt: new Date("2024-01-20"),
79 stats: {
80 views: 1500,
81 likes: 42,
82 comments: 7,
83 },
84 },
85 // More posts embedded in the user document
86 ],
87 profile: {
88 bio: "Software Engineer",
89 avatar: "cdn.com/jane.jpg",
90 socialLinks: {
91 twitter: "@jane",
92 github: "janedoe",
93 },
94 },
95};
96
97// ❌ MongoDB struggles with: "Find all posts tagged 'javascript'
98// across ALL users, sorted by likes"
99// This requires scanning every user document — very slow!
100
101// ✅ MongoDB excels at: "Get user profile with all their posts"
102// Single document read — no JOINs, extremely fast!
103
104// ---------- Key-Value Store (Redis) ----------
105
106// Perfect for caching, sessions, and counters
107async function redisPatterns(redis) {
108 // Session storage
109 await redis.set('session:abc123', JSON.stringify({
110 userId: 'user_123',
111 role: 'admin',
112 loginAt: Date.now(),
113 }), 'EX', 3600); // Expires in 1 hour
114
115 // Counters (atomic increment)
116 await redis.incr('page:home:views'); // Increment view counter
117 await redis.incrby('user:123:score', 10); // Add 10 points
118
119 // Leaderboard (sorted set)
120 await redis.zadd('leaderboard', 1500, 'player_a');
121 await redis.zadd('leaderboard', 2300, 'player_b');
122 await redis.zadd('leaderboard', 1800, 'player_c');
123
124 // Get top 10 players
125 const top10 = await redis.zrevrange('leaderboard', 0, 9, 'WITHSCORES');
126 // Returns: ['player_b', '2300', 'player_c', '1800', 'player_a', '1500']
127
128 // Rate limiting (sliding window)
129 const key = 'ratelimit:user_123';
130 const now = Date.now();
131 await redis.zadd(key, now, `\${now}`);
132 await redis.zremrangebyscore(key, 0, now - 60000); // Remove entries > 1 min old
133 const requestCount = await redis.zcard(key);
134 console.log(`Requests in last minute: \${requestCount}`);
135}
136
137// ---------- Wide-Column Store (Cassandra) ----------
138
139// Cassandra schema for time-series data
140const cassandraSchema = `
141 -- Event logging — optimized for write-heavy workloads
142 CREATE TABLE events (
143 user_id UUID,
144 event_date DATE,
145 event_time TIMESTAMP,
146 event_type TEXT,
147 event_data MAP<TEXT, TEXT>,
148 PRIMARY KEY ((user_id, event_date), event_time)
149 ) WITH CLUSTERING ORDER BY (event_time DESC);
150
151 -- This schema supports:
152 -- ✅ "Get all events for user X on date Y" (partition key lookup)
153 -- ✅ "Get latest 100 events for user X today" (range scan within partition)
154 -- ❌ "Get all events of type 'login' across ALL users" (full table scan!)
155`;
156
157// ---------- Decision Matrix ----------
158
159const decisionMatrix = {
160 'E-commerce orders': {
161 database: 'PostgreSQL',
162 reason: 'ACID transactions for payments, complex inventory queries',
163 },
164 'User sessions': {
165 database: 'Redis',
166 reason: 'Fast read/write, TTL expiration, key-value access pattern',
167 },
168 'Social media feed': {
169 database: 'Cassandra + Redis',
170 reason: 'Cassandra for write-heavy feed storage, Redis for feed cache',
171 },
172 'Full-text search': {
173 database: 'Elasticsearch',
174 reason: 'Inverted index optimized for text search and filtering',
175 },
176 'Recommendation engine': {
177 database: 'Neo4j + Redis',
178 reason: 'Graph traversal for recommendations, Redis for caching results',
179 },
180 'IoT sensor data': {
181 database: 'TimescaleDB or Cassandra',
182 reason: 'Massive write throughput, time-range queries',
183 },
184};
185
186console.log('Database Decision Matrix:', decisionMatrix);

🏋️ Practice Exercise

  1. Database Selection: For each scenario, choose SQL or NoSQL (and the specific database), justifying your choice: (a) Banking transaction system, (b) Real-time gaming leaderboard, (c) Content management system for a blog, (d) IoT platform receiving 1M sensor readings per second, (e) Movie recommendation system like Netflix.

  2. Data Model Comparison: Design the data model for a Twitter-like application in both PostgreSQL (normalized) and MongoDB (denormalized). Compare query complexity for: "Get latest 20 tweets from users I follow."

  3. Polyglot Architecture: Design the database architecture for Uber, identifying which database to use for: ride matching, trip history, user profiles, driver locations, pricing engine, and analytics. Justify each choice.

  4. Migration Scenario: Your startup's PostgreSQL database works great at 10K users but is slowing down at 10M users. Evaluate: (a) optimizing PostgreSQL (indexes, connection pooling, read replicas), (b) migrating to Cassandra, (c) adding a caching layer. What's your recommendation and why?

  5. CAP Theorem Application: For each database (PostgreSQL, MongoDB, Cassandra, DynamoDB, Redis), identify whether it prioritizes CP (Consistency + Partition tolerance) or AP (Availability + Partition tolerance). How does this affect your design choices?

⚠️ Common Mistakes

  • Choosing NoSQL because it's 'newer' or 'more scalable' — many applications work perfectly fine with PostgreSQL at massive scale. Choose based on data model and access patterns, not hype.

  • Storing everything in one database — using PostgreSQL for caching, search, and time-series data. Each workload has an optimal database type. Use polyglot persistence.

  • Not considering access patterns before choosing a database — Cassandra requires you to know your queries upfront (you design tables around queries, not the other way around). This is the opposite of SQL.

  • Underestimating SQL databases — PostgreSQL handles millions of rows, complex queries, JSON data, full-text search, and even key-value patterns (JSONB). Many startups never outgrow PostgreSQL.

  • Ignoring operational complexity — running Cassandra, MongoDB, or Elasticsearch requires significant operational expertise. Managed services (RDS, DynamoDB, Atlas) reduce this burden but cost more.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for SQL vs NoSQL Databases