Auto-Scaling & Capacity Planning

0/3 in this phase0/45 across the roadmap

📖 Concept

Auto-scaling automatically adjusts the number of server instances based on real-time demand. It's essential for handling traffic that varies throughout the day, week, or during special events.

Scaling Triggers

Metric Scale Up When Scale Down When
CPU > 70% average across instances < 30% average
Memory > 80% utilization < 40% utilization
Request count > 1000 req/sec per instance < 200 req/sec
Queue depth > 1000 pending messages < 100 pending
Custom metric p99 latency > 500ms p99 latency < 100ms

Scaling Policies

Reactive Scaling

Scale based on current metrics. Simple but always slightly behind.

Predictive Scaling

Use historical patterns to scale BEFORE traffic arrives. E.g., scale up at 8 AM every weekday because traffic increases predictably.

Scheduled Scaling

Pre-configured scaling for known events: Black Friday, product launches, marketing campaigns.

Capacity Planning Formula

Required instances = Peak QPS / QPS per instance × (1 + safety margin)

Example: Peak 100K QPS, each server handles 5K → 100K/5K × 1.3 = 26 instances

Key Concepts

  • Cooldown period: After scaling, wait N minutes before scaling again (prevent thrashing)
  • Min/Max instances: Set floor (minimum for availability) and ceiling (cost control)
  • Warm-up time: New instances take time to start JVM, load caches, establish connections
  • Graceful shutdown: Drain connections before terminating instances

Interview tip: Mention auto-scaling in any design where traffic is variable. It shows you understand operational efficiency and cost optimization.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Auto-Scaling — Policy Implementation
3// ============================================
4
5class AutoScaler {
6 constructor(config) {
7 this.minInstances = config.min || 2;
8 this.maxInstances = config.max || 20;
9 this.currentInstances = this.minInstances;
10 this.cooldownMs = config.cooldownMs || 300000; // 5 minutes
11 this.lastScaleTime = 0;
12 this.metrics = [];
13 }
14
15 evaluate(currentMetrics) {
16 const now = Date.now();
17 if (now - this.lastScaleTime < this.cooldownMs) {
18 console.log('⏳ In cooldown period, skipping evaluation');
19 return;
20 }
21
22 const avgCPU = currentMetrics.avgCPU;
23 const avgLatency = currentMetrics.p99Latency;
24 const qps = currentMetrics.requestsPerSecond;
25
26 // Scale UP conditions
27 if (avgCPU > 70 || avgLatency > 500 || qps > 5000 * this.currentInstances) {
28 const newCount = Math.min(
29 this.maxInstances,
30 Math.ceil(this.currentInstances * 1.5) // Scale 50% at a time
31 );
32 if (newCount > this.currentInstances) {
33 this.scaleUp(newCount);
34 }
35 }
36
37 // Scale DOWN conditions
38 if (avgCPU < 30 && avgLatency < 100 && qps < 2000 * this.currentInstances) {
39 const newCount = Math.max(
40 this.minInstances,
41 Math.floor(this.currentInstances * 0.75) // Scale down 25% at a time
42 );
43 if (newCount < this.currentInstances) {
44 this.scaleDown(newCount);
45 }
46 }
47 }
48
49 scaleUp(targetCount) {
50 const toAdd = targetCount - this.currentInstances;
51 console.log(`📈 Scaling UP: \${this.currentInstances} → \${targetCount} (+\${toAdd})`);
52 this.currentInstances = targetCount;
53 this.lastScaleTime = Date.now();
54 }
55
56 scaleDown(targetCount) {
57 const toRemove = this.currentInstances - targetCount;
58 console.log(`📉 Scaling DOWN: \${this.currentInstances} → \${targetCount} (-\${toRemove})`);
59 // Drain connections before removing instances
60 console.log(` Draining \${toRemove} instances (30s grace period)`);
61 this.currentInstances = targetCount;
62 this.lastScaleTime = Date.now();
63 }
64}
65
66// ---------- Capacity Planning Calculator ----------
67function calculateCapacity(requirements) {
68 const { peakQPS, qpsPerInstance, safetyMargin = 0.3 } = requirements;
69 const baseInstances = Math.ceil(peakQPS / qpsPerInstance);
70 const withSafety = Math.ceil(baseInstances * (1 + safetyMargin));
71
72 return {
73 baseInstances,
74 withSafetyMargin: withSafety,
75 totalCost: withSafety * requirements.instanceCostPerHour,
76 note: `\${peakQPS} QPS / \${qpsPerInstance} per instance × \${1 + safetyMargin} safety = \${withSafety}`,
77 };
78}
79
80// Demo
81const scaler = new AutoScaler({ min: 2, max: 20 });
82scaler.evaluate({ avgCPU: 80, p99Latency: 600, requestsPerSecond: 15000 });
83scaler.lastScaleTime = 0; // Reset cooldown for demo
84scaler.evaluate({ avgCPU: 20, p99Latency: 50, requestsPerSecond: 3000 });
85
86console.log('Capacity plan:', calculateCapacity({
87 peakQPS: 100000,
88 qpsPerInstance: 5000,
89 safetyMargin: 0.3,
90 instanceCostPerHour: 0.10,
91}));

🏋️ Practice Exercise

  1. Auto-Scaling Policy: Design auto-scaling policies for: (a) an API server (CPU-based), (b) a Kafka consumer (queue-depth-based), (c) a WebSocket server (connection-count-based).

  2. Capacity Planning: Your service handles 10K req/sec normally and 50K during peak (3 hours/day). Each instance handles 2K req/sec and costs $0.10/hour. Calculate: minimum instances, peak instances, and daily cost with auto-scaling vs fixed capacity.

  3. Predictive Scaling: Design a predictive scaling system that learns from the last 4 weeks of traffic patterns and pre-scales before predicted peaks.

  4. Thundering Herd: After a deployment, all instances restart simultaneously with cold caches. Design a rolling deployment strategy that prevents this.

⚠️ Common Mistakes

  • Not setting a cooldown period — without cooldown, the auto-scaler can thrash between scaling up and down every minute, wasting resources on instance startup/shutdown.

  • Scaling based on a single metric — CPU might be low but latency high (due to I/O wait). Use multiple metrics and scale on the first one that triggers.

  • Not accounting for startup time — new instances need 30-120 seconds to start, warm caches, and become ready. Scale proactively, not reactively.

  • Setting max instances too low — during a viral event, traffic might be 10x normal. If your max is 5x, the system crashes. Set generous maximums with cost alerts.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Auto-Scaling & Capacity Planning