Clustering & Horizontal Scaling

0/2 in this phase0/48 across the roadmap

📖 Concept

Node.js runs on a single thread by default, utilizing only one CPU core. The cluster module enables running multiple Node.js processes to leverage all CPU cores.

Scaling strategies:

Strategy	How	Use Case
Vertical	Bigger server (more CPU/RAM)	Quick fix, limited ceiling
Node.js Cluster	Fork worker processes	Multi-core utilization
PM2	Process manager with cluster mode	Production deployment
Docker + K8s	Container orchestration	Microservices, cloud-native
Load Balancer	nginx / HAProxy / ALB	Multiple servers

cluster module:

Master process — manages workers, doesn't handle requests
Worker processes — handle actual HTTP requests
Workers share the same port (OS distributes connections)
Workers are independent processes (crash isolation)

PM2 — Production Process Manager:

pm2 start app.js -i max         # Cluster mode (all cores)
pm2 start app.js -i 4           # 4 worker processes
pm2 reload app.js               # Zero-downtime reload
pm2 monit                       # Real-time monitoring
pm2 logs                        # View logs from all workers
pm2 save && pm2 startup         # Auto-start on server reboot

When to scale:

Single Node.js process → PM2 cluster mode (same machine)
PM2 cluster hits limits → multiple machines + load balancer
Multiple machines → Docker + Kubernetes
Global scale → CDN + edge computing + auto-scaling groups

🏠 Real-world analogy: Clustering is like a restaurant with multiple kitchens. Instead of one chef (single thread) handling all orders, you have multiple chefs (worker processes) in separate kitchens (processes), with a host (master/load balancer) assigning customers to the least busy kitchen.

💻 Code Example

codeTap to expand ⛶

1// Clustering & Horizontal Scaling
2
3const cluster = require("cluster");
4const os = require("os");
5const express = require("express");
6
7// 1. Built-in cluster module
8if (cluster.isPrimary) {
9  const numCPUs = os.cpus().length;
10  console.log(`Primary ${process.pid} starting ${numCPUs} workers...`);
11
12  // Fork workers
13  for (let i = 0; i < numCPUs; i++) {
14    cluster.fork();
15  }
16
17  // Handle worker crashes
18  cluster.on("exit", (worker, code, signal) => {
19    console.error(`Worker ${worker.process.pid} died (code: ${code}). Restarting...`);
20    cluster.fork(); // Auto-restart
21  });
22
23  // Graceful shutdown
24  process.on("SIGTERM", () => {
25    console.log("Primary received SIGTERM. Shutting down workers...");
26    for (const worker of Object.values(cluster.workers)) {
27      worker.process.kill("SIGTERM");
28    }
29  });
30} else {
31  // Worker process — each runs its own Express server
32  const app = express();
33
34  app.get("/api/health", (req, res) => {
35    res.json({
36      status: "healthy",
37      pid: process.pid,
38      worker: cluster.worker.id,
39      uptime: process.uptime(),
40    });
41  });
42
43  app.get("/api/heavy", (req, res) => {
44    // CPU-intensive work (only blocks THIS worker)
45    let result = 0;
46    for (let i = 0; i < 1e7; i++) result += Math.sqrt(i);
47    res.json({ result, pid: process.pid });
48  });
49
50  app.listen(3000, () => {
51    console.log(`Worker ${process.pid} listening on port 3000`);
52  });
53}
54
55// 2. PM2 ecosystem file (ecosystem.config.js)
56const pm2Config = {
57  apps: [
58    {
59      name: "my-api",
60      script: "src/server.js",
61      instances: "max",        // Use all CPU cores
62      exec_mode: "cluster",    // Cluster mode
63      max_memory_restart: "500M",
64      env: {
65        NODE_ENV: "production",
66        PORT: 3000,
67      },
68      env_development: {
69        NODE_ENV: "development",
70        PORT: 3000,
71      },
72      // Logging
73      log_file: "./logs/combined.log",
74      error_file: "./logs/error.log",
75      merge_logs: true,
76      log_date_format: "YYYY-MM-DD HH:mm:ss",
77      // Auto-restart
78      watch: false,
79      max_restarts: 10,
80      restart_delay: 4000,
81      // Graceful shutdown
82      kill_timeout: 5000,
83      listen_timeout: 10000,
84    },
85  ],
86};
87
88// 3. nginx load balancer configuration (reference)
89const nginxConfig = `
90# /etc/nginx/sites-available/my-api
91upstream nodejs_cluster {
92    least_conn;                    # Least connections algorithm
93    server 127.0.0.1:3001;        # Node instance 1
94    server 127.0.0.1:3002;        # Node instance 2
95    server 127.0.0.1:3003;        # Node instance 3
96    server 127.0.0.1:3004;        # Node instance 4
97    keepalive 64;                  # Connection pooling
98}
99
100server {
101    listen 80;
102    listen 443 ssl;
103    server_name api.example.com;
104
105    # SSL
106    ssl_certificate /etc/ssl/cert.pem;
107    ssl_certificate_key /etc/ssl/key.pem;
108
109    # Proxy to Node.js cluster
110    location / {
111        proxy_pass http://nodejs_cluster;
112        proxy_http_version 1.1;
113        proxy_set_header Upgrade $http_upgrade;
114        proxy_set_header Connection 'upgrade';
115        proxy_set_header Host $host;
116        proxy_set_header X-Real-IP $remote_addr;
117        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
118        proxy_cache_bypass $http_upgrade;
119
120        # Timeouts
121        proxy_connect_timeout 60s;
122        proxy_send_timeout 60s;
123        proxy_read_timeout 60s;
124    }
125
126    # Serve static files directly (bypass Node.js)
127    location /static/ {
128        alias /var/www/static/;
129        expires 1y;
130        add_header Cache-Control "public, immutable";
131    }
132
133    # Gzip compression
134    gzip on;
135    gzip_types text/plain application/json application/javascript text/css;
136}
137`;
138
139module.exports = pm2Config;

🏋️ Practice Exercise

Exercises:

Implement clustering using the cluster module — fork workers for each CPU core with auto-restart
Set up PM2 with an ecosystem file — configure cluster mode, log files, and memory restart limits
Configure nginx as a reverse proxy / load balancer for multiple Node.js instances
Implement zero-downtime deployment using PM2's reload command
Load test a single-instance server vs. clustered server — compare throughput and response times
Build health check endpoints that report per-worker statistics

⚠️ Common Mistakes

Storing session state in process memory with clustering — each worker has its own memory; use Redis or a database for shared state
Not implementing graceful shutdown — when restarting workers, allow in-flight requests to complete before killing the process
Using cluster.fork() without auto-restart — if a worker crashes without restart logic, your capacity degrades over time
Running more workers than CPU cores — this causes context switching overhead; match workers to cores (or use PM2's max setting)
Not putting nginx in front of Node.js in production — nginx handles SSL termination, static files, gzip, rate limiting, and DDoS protection more efficiently

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Clustering & Horizontal Scaling

Was this topic helpful?

← PreviousPerformance Profiling & Optimization Next →Docker & Containerization