SQL vs NoSQL Databases

0/6 in this phase0/45 across the roadmap

📖 Concept

One of the most critical decisions in system design is choosing between SQL (relational) and NoSQL (non-relational) databases. There's no universally "better" option — the choice depends entirely on your data model, access patterns, and scale requirements.

SQL (Relational) Databases

SQL databases store data in tables with rows and columns, connected through foreign keys and queried with SQL (Structured Query Language).

Examples: PostgreSQL, MySQL, SQLite, Oracle, SQL Server

Strengths:

ACID transactions — Atomicity, Consistency, Isolation, Durability
Complex queries — JOINs, subqueries, aggregations, window functions
Data integrity — Foreign keys, constraints, triggers
Mature ecosystem — 40+ years of optimization, tooling, and expertise

Weaknesses:

Vertical scaling is the primary option (bigger server)
Rigid schema — schema changes on large tables can be slow/risky
Horizontal scaling is possible but complex (sharding)

NoSQL Database Types

Type	Data Model	Examples	Best For
Document	JSON-like documents	MongoDB, CouchDB	Flexible schemas, content management
Key-Value	Simple key → value pairs	Redis, DynamoDB	Caching, session storage, high-speed lookups
Wide-Column	Row key → columns (grouped in families)	Cassandra, HBase	Time series, IoT, write-heavy workloads
Graph	Nodes and edges	Neo4j, Amazon Neptune	Social networks, recommendations, fraud detection

Decision Framework

Criteria	Choose SQL	Choose NoSQL
Data relationships	Complex, many-to-many	Simple, denormalized
Schema	Well-defined, stable	Evolving, flexible
Transactions	ACID required	Eventual consistency OK
Scale pattern	Moderate scale, complex queries	Massive scale, simple queries
Access pattern	Ad-hoc queries, reports	Known access patterns, key-based lookups
Examples	Banking, ERP, inventory	Social feeds, IoT, content, real-time analytics

The Polyglot Persistence Approach

Modern systems often use multiple databases, each optimized for its task:

PostgreSQL for user accounts and transactions (ACID needed)
Redis for caching and sessions (speed needed)
Elasticsearch for full-text search (search functionality)
Cassandra for event logs and time-series data (write-heavy)
Neo4j for social graph queries (relationship traversal)

Pro tip: In interviews, saying "I'd use PostgreSQL for X because we need ACID transactions, and Cassandra for Y because it's write-heavy with 100K events/sec" shows much more maturity than picking one database for everything.

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// SQL vs NoSQL — Practical Comparison
3// ============================================
4
5// ---------- SQL: Relational Data Model ----------
6
7// Schema definition (PostgreSQL)
8const sqlSchema = `
9  -- Users table
10  CREATE TABLE users (
11    id SERIAL PRIMARY KEY,
12    name VARCHAR(100) NOT NULL,
13    email VARCHAR(255) UNIQUE NOT NULL,
14    created_at TIMESTAMP DEFAULT NOW()
15  );
16
17  -- Posts table (related to users)
18  CREATE TABLE posts (
19    id SERIAL PRIMARY KEY,
20    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
21    title VARCHAR(200) NOT NULL,
22    content TEXT NOT NULL,
23    published BOOLEAN DEFAULT false,
24    created_at TIMESTAMP DEFAULT NOW()
25  );
26
27  -- Tags (many-to-many with posts)
28  CREATE TABLE tags (
29    id SERIAL PRIMARY KEY,
30    name VARCHAR(50) UNIQUE NOT NULL
31  );
32
33  CREATE TABLE post_tags (
34    post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
35    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
36    PRIMARY KEY (post_id, tag_id)
37  );
38
39  -- Index for common queries
40  CREATE INDEX idx_posts_user_id ON posts(user_id);
41  CREATE INDEX idx_posts_created_at ON posts(created_at DESC);
42`;
43
44// ✅ SQL excels at complex queries with JOINs
45const complexQuery = `
46  -- "Get top 10 most active users with their post count and latest post"
47  SELECT
48    u.id,
49    u.name,
50    COUNT(p.id) as post_count,
51    MAX(p.created_at) as latest_post_at,
52    ARRAY_AGG(DISTINCT t.name) as tags_used
53  FROM users u
54  JOIN posts p ON p.user_id = u.id
55  LEFT JOIN post_tags pt ON pt.post_id = p.id
56  LEFT JOIN tags t ON t.id = pt.tag_id
57  WHERE p.published = true
58  GROUP BY u.id, u.name
59  ORDER BY post_count DESC
60  LIMIT 10;
61`;
62
63// ---------- NoSQL: Document Model (MongoDB) ----------
64
65// Same data in MongoDB — single document, no JOINs needed
66const mongoDocument = {
67  _id: "user_123",
68  name: "Jane Doe",
69  email: "jane@example.com",
70  createdAt: new Date("2024-01-15"),
71  posts: [
72    {
73      _id: "post_456",
74      title: "My First Post",
75      content: "Hello world!",
76      published: true,
77      tags: ["javascript", "tutorial"],     // Embedded, no separate table
78      createdAt: new Date("2024-01-20"),
79      stats: {
80        views: 1500,
81        likes: 42,
82        comments: 7,
83      },
84    },
85    // More posts embedded in the user document
86  ],
87  profile: {
88    bio: "Software Engineer",
89    avatar: "cdn.com/jane.jpg",
90    socialLinks: {
91      twitter: "@jane",
92      github: "janedoe",
93    },
94  },
95};
96
97// ❌ MongoDB struggles with: "Find all posts tagged 'javascript'
98// across ALL users, sorted by likes"
99// This requires scanning every user document — very slow!
100
101// ✅ MongoDB excels at: "Get user profile with all their posts"
102// Single document read — no JOINs, extremely fast!
103
104// ---------- Key-Value Store (Redis) ----------
105
106// Perfect for caching, sessions, and counters
107async function redisPatterns(redis) {
108  // Session storage
109  await redis.set('session:abc123', JSON.stringify({
110    userId: 'user_123',
111    role: 'admin',
112    loginAt: Date.now(),
113  }), 'EX', 3600); // Expires in 1 hour
114
115  // Counters (atomic increment)
116  await redis.incr('page:home:views');        // Increment view counter
117  await redis.incrby('user:123:score', 10);   // Add 10 points
118
119  // Leaderboard (sorted set)
120  await redis.zadd('leaderboard', 1500, 'player_a');
121  await redis.zadd('leaderboard', 2300, 'player_b');
122  await redis.zadd('leaderboard', 1800, 'player_c');
123
124  // Get top 10 players
125  const top10 = await redis.zrevrange('leaderboard', 0, 9, 'WITHSCORES');
126  // Returns: ['player_b', '2300', 'player_c', '1800', 'player_a', '1500']
127
128  // Rate limiting (sliding window)
129  const key = 'ratelimit:user_123';
130  const now = Date.now();
131  await redis.zadd(key, now, `\${now}`);
132  await redis.zremrangebyscore(key, 0, now - 60000); // Remove entries > 1 min old
133  const requestCount = await redis.zcard(key);
134  console.log(`Requests in last minute: \${requestCount}`);
135}
136
137// ---------- Wide-Column Store (Cassandra) ----------
138
139// Cassandra schema for time-series data
140const cassandraSchema = `
141  -- Event logging — optimized for write-heavy workloads
142  CREATE TABLE events (
143    user_id UUID,
144    event_date DATE,
145    event_time TIMESTAMP,
146    event_type TEXT,
147    event_data MAP<TEXT, TEXT>,
148    PRIMARY KEY ((user_id, event_date), event_time)
149  ) WITH CLUSTERING ORDER BY (event_time DESC);
150
151  -- This schema supports:
152  -- ✅ "Get all events for user X on date Y" (partition key lookup)
153  -- ✅ "Get latest 100 events for user X today" (range scan within partition)
154  -- ❌ "Get all events of type 'login' across ALL users" (full table scan!)
155`;
156
157// ---------- Decision Matrix ----------
158
159const decisionMatrix = {
160  'E-commerce orders': {
161    database: 'PostgreSQL',
162    reason: 'ACID transactions for payments, complex inventory queries',
163  },
164  'User sessions': {
165    database: 'Redis',
166    reason: 'Fast read/write, TTL expiration, key-value access pattern',
167  },
168  'Social media feed': {
169    database: 'Cassandra + Redis',
170    reason: 'Cassandra for write-heavy feed storage, Redis for feed cache',
171  },
172  'Full-text search': {
173    database: 'Elasticsearch',
174    reason: 'Inverted index optimized for text search and filtering',
175  },
176  'Recommendation engine': {
177    database: 'Neo4j + Redis',
178    reason: 'Graph traversal for recommendations, Redis for caching results',
179  },
180  'IoT sensor data': {
181    database: 'TimescaleDB or Cassandra',
182    reason: 'Massive write throughput, time-range queries',
183  },
184};
185
186console.log('Database Decision Matrix:', decisionMatrix);

🏋️ Practice Exercise

Database Selection: For each scenario, choose SQL or NoSQL (and the specific database), justifying your choice: (a) Banking transaction system, (b) Real-time gaming leaderboard, (c) Content management system for a blog, (d) IoT platform receiving 1M sensor readings per second, (e) Movie recommendation system like Netflix.
Data Model Comparison: Design the data model for a Twitter-like application in both PostgreSQL (normalized) and MongoDB (denormalized). Compare query complexity for: "Get latest 20 tweets from users I follow."
Polyglot Architecture: Design the database architecture for Uber, identifying which database to use for: ride matching, trip history, user profiles, driver locations, pricing engine, and analytics. Justify each choice.
Migration Scenario: Your startup's PostgreSQL database works great at 10K users but is slowing down at 10M users. Evaluate: (a) optimizing PostgreSQL (indexes, connection pooling, read replicas), (b) migrating to Cassandra, (c) adding a caching layer. What's your recommendation and why?
CAP Theorem Application: For each database (PostgreSQL, MongoDB, Cassandra, DynamoDB, Redis), identify whether it prioritizes CP (Consistency + Partition tolerance) or AP (Availability + Partition tolerance). How does this affect your design choices?

⚠️ Common Mistakes

Choosing NoSQL because it's 'newer' or 'more scalable' — many applications work perfectly fine with PostgreSQL at massive scale. Choose based on data model and access patterns, not hype.
Storing everything in one database — using PostgreSQL for caching, search, and time-series data. Each workload has an optimal database type. Use polyglot persistence.
Not considering access patterns before choosing a database — Cassandra requires you to know your queries upfront (you design tables around queries, not the other way around). This is the opposite of SQL.
Underestimating SQL databases — PostgreSQL handles millions of rows, complex queries, JSON data, full-text search, and even key-value patterns (JSONB). Many startups never outgrow PostgreSQL.
Ignoring operational complexity — running Cassandra, MongoDB, or Elasticsearch requires significant operational expertise. Managed services (RDS, DynamoDB, Atlas) reduce this burden but cost more.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for SQL vs NoSQL Databases

Was this topic helpful?

← PreviousAPI Design Best Practices Next →Database Indexing & Query Optimization