Trade-offs & Design Decisions

0/4 in this phase0/45 across the roadmap

📖 Concept

System design is fundamentally about making trade-offs. There is no perfect system — every design decision involves giving up something to gain something else. The best engineers don't claim their design is "the best"; they explain why they chose it given the constraints.

The Big Trade-offs

1. Consistency vs. Availability (CAP Theorem Preview)

Choose Consistency Choose Availability
Banking transactions Social media feeds
Inventory management DNS resolution
Booking systems Content delivery

2. Latency vs. Throughput

  • Optimize for latency: User-facing APIs (< 100ms), real-time games
  • Optimize for throughput: Batch data processing, log aggregation, ETL pipelines

3. Read vs. Write Optimization

  • Read-heavy (100:1): News feed, product catalog → Denormalize data, add caches
  • Write-heavy (1:1 or more writes): Logging, IoT sensors → Append-only storage, write-behind caching
  • Balanced: Chat applications → Need both optimized

4. Cost vs. Performance

  • More replicas = better availability = higher cost
  • More caching = lower latency = more memory cost
  • More regions = lower global latency = much higher operational complexity

5. Simplicity vs. Scalability

  • Monolith is simpler but harder to scale individual components
  • Microservices scale independently but add complexity (network calls, service discovery, distributed debugging)

Making Trade-off Decisions

Use this framework when facing a design decision:

  1. Identify the constraint: What's the bottleneck or limitation?
  2. List the options: What are the possible approaches?
  3. Evaluate each option against your NFRs (latency, consistency, cost)
  4. Choose and justify: Pick one and explain why it fits your requirements
  5. Acknowledge the downside: State what you're giving up

Interview tip: Saying "I chose X because Y, and I'm accepting the trade-off of Z" is the hallmark of a senior engineer. Interviewers love this.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Trade-offs in Action — Real Design Decisions
3// ============================================
4
5// ---------- Trade-off 1: Normalization vs Denormalization ----------
6
7// ❌ NORMALIZED (optimized for writes, slow reads — many JOINs)
8// Tables: users, posts, comments, likes
9// To show a feed, you need 4+ JOIN queries
10
11const normalizedQuery = `
12 SELECT p.id, p.content, u.name, u.avatar,
13 COUNT(DISTINCT l.id) as likes,
14 COUNT(DISTINCT c.id) as comments
15 FROM posts p
16 JOIN users u ON p.user_id = u.id
17 LEFT JOIN likes l ON l.post_id = p.id
18 LEFT JOIN comments c ON c.post_id = p.id
19 WHERE p.user_id IN (SELECT followed_id FROM follows WHERE follower_id = ?)
20 GROUP BY p.id
21 ORDER BY p.created_at DESC
22 LIMIT 20;
23 -- This query is SLOW at scale (multiple JOINs, subquery)
24`;
25
26// ✅ DENORMALIZED (optimized for reads, more work on writes)
27// Pre-computed feed table: each row has everything needed to render
28
29const denormalizedFeedEntry = {
30 postId: "post_123",
31 authorName: "Jane Doe", // Copied from users table
32 authorAvatar: "cdn.com/jane.jpg", // Copied from users table
33 content: "Hello world!",
34 likeCount: 42, // Pre-computed counter
35 commentCount: 7, // Pre-computed counter
36 topComments: [ // Pre-fetched top 3 comments
37 { user: "Bob", text: "Great post!" },
38 ],
39 createdAt: "2024-01-15T10:30:00Z",
40};
41
42// Trade-off: When Jane changes her avatar, we need to update
43// every feed entry that references her — more write complexity
44// but reads are now a simple single-table lookup.
45
46// ---------- Trade-off 2: Sync vs Async Processing ----------
47
48// ❌ SYNCHRONOUS: User waits for everything
49async function handleOrderSync(order) {
50 await validateOrder(order); // 50ms
51 await chargePayment(order); // 200ms
52 await updateInventory(order); // 100ms
53 await sendConfirmationEmail(order); // 300ms
54 await notifyWarehouse(order); // 150ms
55 await updateAnalytics(order); // 100ms
56 // Total: ~900ms — user waits for ALL of this!
57 return { status: 'completed' };
58}
59
60// ✅ ASYNCHRONOUS: User gets immediate response, rest happens in background
61async function handleOrderAsync(order) {
62 // Only do what the user MUST wait for
63 await validateOrder(order); // 50ms
64 const paymentResult = await chargePayment(order); // 200ms
65
66 if (paymentResult.success) {
67 // Queue everything else — user doesn't need to wait
68 await messageQueue.publish('order.confirmed', {
69 orderId: order.id,
70 items: order.items,
71 });
72 // Total user-facing latency: ~250ms (4x faster!)
73 return { status: 'confirmed', orderId: order.id };
74 }
75 return { status: 'payment_failed' };
76}
77
78// Background workers process the queued events:
79// Worker 1: updateInventory
80// Worker 2: sendConfirmationEmail
81// Worker 3: notifyWarehouse
82// Worker 4: updateAnalytics
83
84// Trade-off: Async is faster for users but adds complexity
85// (message queue, worker management, retry logic, eventual consistency)
86
87// ---------- Trade-off 3: Strong vs Eventual Consistency ----------
88
89// Strong consistency — for banking
90async function transferMoney(fromAccount, toAccount, amount) {
91 const transaction = await db.beginTransaction();
92 try {
93 // Both operations in ONE transaction — either both succeed or both fail
94 await transaction.query(
95 'UPDATE accounts SET balance = balance - ? WHERE id = ? AND balance >= ?',
96 [amount, fromAccount, amount]
97 );
98 await transaction.query(
99 'UPDATE accounts SET balance = balance + ? WHERE id = ?',
100 [amount, toAccount]
101 );
102 await transaction.commit();
103 // Guarantee: balances are ALWAYS consistent
104 return { success: true };
105 } catch (e) {
106 await transaction.rollback();
107 return { success: false, error: e.message };
108 }
109}
110
111// Eventual consistency — for social media like counts
112async function likePost(userId, postId) {
113 // Write to fast local store immediately
114 await redis.sadd(`post:\${postId}:likes`, userId);
115
116 // Asynchronously update the main database
117 // The count might be slightly off for a few seconds — that's OK!
118 await messageQueue.publish('post.liked', { userId, postId });
119
120 // Return immediately — user sees their like instantly
121 return { liked: true };
122}
123
124// Trade-off: Bank transfers MUST be strongly consistent (can't lose money)
125// Social media likes can be eventually consistent (it's fine if the count
126// shows 41 instead of 42 for a few seconds)

🏋️ Practice Exercise

  1. Trade-off Matrix: For each pair, explain which you'd choose and why: (a) SQL vs NoSQL for an e-commerce catalog, (b) Push vs Pull for a notification system, (c) Monolith vs Microservices for a startup MVP.

  2. Decision Document: You're building a ride-sharing app. Write a one-page decision document comparing synchronous vs asynchronous ride matching. Include pros, cons, and your final recommendation.

  3. Interview Role-Play: Practice explaining this trade-off out loud: "I chose eventual consistency for the news feed because..." — record yourself and check if you clearly state what you gain and what you lose.

  4. Cost-Benefit Analysis: Your team wants to add a Redis cache layer. The cache costs $5K/month but reduces average latency from 200ms to 20ms. Calculate the cost per millisecond saved and argue for or against it.

  5. Failure Mode Analysis: For a payment system using strong consistency, describe what happens during a network partition. Now describe what happens with eventual consistency. Which failure mode is more acceptable for payments?

⚠️ Common Mistakes

  • Saying 'this is the best design' without acknowledging trade-offs — there IS no best design, only designs that are best FOR specific constraints. Always state your trade-offs.

  • Choosing a technology because it's popular rather than because it fits the requirements — 'We'll use Kafka' without explaining why event streaming is needed.

  • Over-engineering early — adding microservices, caching, and message queues to a system with 100 users. Start simple, add complexity when scale demands it.

  • Not quantifying trade-offs — saying 'this is faster' is weak; saying 'this reduces p99 latency from 500ms to 50ms at the cost of 2x storage' is strong.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Trade-offs & Design Decisions