Trade-offs & Design Decisions

0/4 in this phase0/45 across the roadmap

📖 Concept

System design is fundamentally about making trade-offs. There is no perfect system — every design decision involves giving up something to gain something else. The best engineers don't claim their design is "the best"; they explain why they chose it given the constraints.

The Big Trade-offs

1. Consistency vs. Availability (CAP Theorem Preview)

Choose Consistency	Choose Availability
Banking transactions	Social media feeds
Inventory management	DNS resolution
Booking systems	Content delivery

2. Latency vs. Throughput

Optimize for latency: User-facing APIs (< 100ms), real-time games
Optimize for throughput: Batch data processing, log aggregation, ETL pipelines

3. Read vs. Write Optimization

Read-heavy (100:1): News feed, product catalog → Denormalize data, add caches
Write-heavy (1:1 or more writes): Logging, IoT sensors → Append-only storage, write-behind caching
Balanced: Chat applications → Need both optimized

4. Cost vs. Performance

More replicas = better availability = higher cost
More caching = lower latency = more memory cost
More regions = lower global latency = much higher operational complexity

5. Simplicity vs. Scalability

Monolith is simpler but harder to scale individual components
Microservices scale independently but add complexity (network calls, service discovery, distributed debugging)

Making Trade-off Decisions

Use this framework when facing a design decision:

Identify the constraint: What's the bottleneck or limitation?
List the options: What are the possible approaches?
Evaluate each option against your NFRs (latency, consistency, cost)
Choose and justify: Pick one and explain why it fits your requirements
Acknowledge the downside: State what you're giving up

Interview tip: Saying "I chose X because Y, and I'm accepting the trade-off of Z" is the hallmark of a senior engineer. Interviewers love this.

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// Trade-offs in Action — Real Design Decisions
3// ============================================
4
5// ---------- Trade-off 1: Normalization vs Denormalization ----------
6
7// ❌ NORMALIZED (optimized for writes, slow reads — many JOINs)
8// Tables: users, posts, comments, likes
9// To show a feed, you need 4+ JOIN queries
10
11const normalizedQuery = `
12  SELECT p.id, p.content, u.name, u.avatar,
13         COUNT(DISTINCT l.id) as likes,
14         COUNT(DISTINCT c.id) as comments
15  FROM posts p
16  JOIN users u ON p.user_id = u.id
17  LEFT JOIN likes l ON l.post_id = p.id
18  LEFT JOIN comments c ON c.post_id = p.id
19  WHERE p.user_id IN (SELECT followed_id FROM follows WHERE follower_id = ?)
20  GROUP BY p.id
21  ORDER BY p.created_at DESC
22  LIMIT 20;
23  -- This query is SLOW at scale (multiple JOINs, subquery)
24`;
25
26// ✅ DENORMALIZED (optimized for reads, more work on writes)
27// Pre-computed feed table: each row has everything needed to render
28
29const denormalizedFeedEntry = {
30  postId: "post_123",
31  authorName: "Jane Doe",           // Copied from users table
32  authorAvatar: "cdn.com/jane.jpg", // Copied from users table
33  content: "Hello world!",
34  likeCount: 42,                    // Pre-computed counter
35  commentCount: 7,                  // Pre-computed counter
36  topComments: [                    // Pre-fetched top 3 comments
37    { user: "Bob", text: "Great post!" },
38  ],
39  createdAt: "2024-01-15T10:30:00Z",
40};
41
42// Trade-off: When Jane changes her avatar, we need to update
43// every feed entry that references her — more write complexity
44// but reads are now a simple single-table lookup.
45
46// ---------- Trade-off 2: Sync vs Async Processing ----------
47
48// ❌ SYNCHRONOUS: User waits for everything
49async function handleOrderSync(order) {
50  await validateOrder(order);           // 50ms
51  await chargePayment(order);           // 200ms
52  await updateInventory(order);         // 100ms
53  await sendConfirmationEmail(order);   // 300ms
54  await notifyWarehouse(order);         // 150ms
55  await updateAnalytics(order);         // 100ms
56  // Total: ~900ms — user waits for ALL of this!
57  return { status: 'completed' };
58}
59
60// ✅ ASYNCHRONOUS: User gets immediate response, rest happens in background
61async function handleOrderAsync(order) {
62  // Only do what the user MUST wait for
63  await validateOrder(order);           // 50ms
64  const paymentResult = await chargePayment(order); // 200ms
65
66  if (paymentResult.success) {
67    // Queue everything else — user doesn't need to wait
68    await messageQueue.publish('order.confirmed', {
69      orderId: order.id,
70      items: order.items,
71    });
72    // Total user-facing latency: ~250ms (4x faster!)
73    return { status: 'confirmed', orderId: order.id };
74  }
75  return { status: 'payment_failed' };
76}
77
78// Background workers process the queued events:
79// Worker 1: updateInventory
80// Worker 2: sendConfirmationEmail
81// Worker 3: notifyWarehouse
82// Worker 4: updateAnalytics
83
84// Trade-off: Async is faster for users but adds complexity
85// (message queue, worker management, retry logic, eventual consistency)
86
87// ---------- Trade-off 3: Strong vs Eventual Consistency ----------
88
89// Strong consistency — for banking
90async function transferMoney(fromAccount, toAccount, amount) {
91  const transaction = await db.beginTransaction();
92  try {
93    // Both operations in ONE transaction — either both succeed or both fail
94    await transaction.query(
95      'UPDATE accounts SET balance = balance - ? WHERE id = ? AND balance >= ?',
96      [amount, fromAccount, amount]
97    );
98    await transaction.query(
99      'UPDATE accounts SET balance = balance + ? WHERE id = ?',
100      [amount, toAccount]
101    );
102    await transaction.commit();
103    // Guarantee: balances are ALWAYS consistent
104    return { success: true };
105  } catch (e) {
106    await transaction.rollback();
107    return { success: false, error: e.message };
108  }
109}
110
111// Eventual consistency — for social media like counts
112async function likePost(userId, postId) {
113  // Write to fast local store immediately
114  await redis.sadd(`post:\${postId}:likes`, userId);
115
116  // Asynchronously update the main database
117  // The count might be slightly off for a few seconds — that's OK!
118  await messageQueue.publish('post.liked', { userId, postId });
119
120  // Return immediately — user sees their like instantly
121  return { liked: true };
122}
123
124// Trade-off: Bank transfers MUST be strongly consistent (can't lose money)
125// Social media likes can be eventually consistent (it's fine if the count
126// shows 41 instead of 42 for a few seconds)

🏋️ Practice Exercise

Trade-off Matrix: For each pair, explain which you'd choose and why: (a) SQL vs NoSQL for an e-commerce catalog, (b) Push vs Pull for a notification system, (c) Monolith vs Microservices for a startup MVP.
Decision Document: You're building a ride-sharing app. Write a one-page decision document comparing synchronous vs asynchronous ride matching. Include pros, cons, and your final recommendation.
Interview Role-Play: Practice explaining this trade-off out loud: "I chose eventual consistency for the news feed because..." — record yourself and check if you clearly state what you gain and what you lose.
Cost-Benefit Analysis: Your team wants to add a Redis cache layer. The cache costs $5K/month but reduces average latency from 200ms to 20ms. Calculate the cost per millisecond saved and argue for or against it.
Failure Mode Analysis: For a payment system using strong consistency, describe what happens during a network partition. Now describe what happens with eventual consistency. Which failure mode is more acceptable for payments?

⚠️ Common Mistakes

Saying 'this is the best design' without acknowledging trade-offs — there IS no best design, only designs that are best FOR specific constraints. Always state your trade-offs.
Choosing a technology because it's popular rather than because it fits the requirements — 'We'll use Kafka' without explaining why event streaming is needed.
Over-engineering early — adding microservices, caching, and message queues to a system with 100 users. Start simple, add complexity when scale demands it.
Not quantifying trade-offs — saying 'this is faster' is weak; saying 'this reduces p99 latency from 500ms to 50ms at the cost of 2x storage' is strong.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Trade-offs & Design Decisions

Was this topic helpful?

← PreviousBack-of-Envelope Estimation Next →DNS & Domain Resolution