Clustering & Horizontal Scaling
📖 Concept
Node.js runs on a single thread by default, utilizing only one CPU core. The cluster module enables running multiple Node.js processes to leverage all CPU cores.
Scaling strategies:
| Strategy | How | Use Case |
|---|---|---|
| Vertical | Bigger server (more CPU/RAM) | Quick fix, limited ceiling |
| Node.js Cluster | Fork worker processes | Multi-core utilization |
| PM2 | Process manager with cluster mode | Production deployment |
| Docker + K8s | Container orchestration | Microservices, cloud-native |
| Load Balancer | nginx / HAProxy / ALB | Multiple servers |
cluster module:
- Master process — manages workers, doesn't handle requests
- Worker processes — handle actual HTTP requests
- Workers share the same port (OS distributes connections)
- Workers are independent processes (crash isolation)
PM2 — Production Process Manager:
pm2 start app.js -i max # Cluster mode (all cores)
pm2 start app.js -i 4 # 4 worker processes
pm2 reload app.js # Zero-downtime reload
pm2 monit # Real-time monitoring
pm2 logs # View logs from all workers
pm2 save && pm2 startup # Auto-start on server reboot
When to scale:
- Single Node.js process → PM2 cluster mode (same machine)
- PM2 cluster hits limits → multiple machines + load balancer
- Multiple machines → Docker + Kubernetes
- Global scale → CDN + edge computing + auto-scaling groups
🏠 Real-world analogy: Clustering is like a restaurant with multiple kitchens. Instead of one chef (single thread) handling all orders, you have multiple chefs (worker processes) in separate kitchens (processes), with a host (master/load balancer) assigning customers to the least busy kitchen.
💻 Code Example
1// Clustering & Horizontal Scaling23const cluster = require("cluster");4const os = require("os");5const express = require("express");67// 1. Built-in cluster module8if (cluster.isPrimary) {9 const numCPUs = os.cpus().length;10 console.log(`Primary ${process.pid} starting ${numCPUs} workers...`);1112 // Fork workers13 for (let i = 0; i < numCPUs; i++) {14 cluster.fork();15 }1617 // Handle worker crashes18 cluster.on("exit", (worker, code, signal) => {19 console.error(`Worker ${worker.process.pid} died (code: ${code}). Restarting...`);20 cluster.fork(); // Auto-restart21 });2223 // Graceful shutdown24 process.on("SIGTERM", () => {25 console.log("Primary received SIGTERM. Shutting down workers...");26 for (const worker of Object.values(cluster.workers)) {27 worker.process.kill("SIGTERM");28 }29 });30} else {31 // Worker process — each runs its own Express server32 const app = express();3334 app.get("/api/health", (req, res) => {35 res.json({36 status: "healthy",37 pid: process.pid,38 worker: cluster.worker.id,39 uptime: process.uptime(),40 });41 });4243 app.get("/api/heavy", (req, res) => {44 // CPU-intensive work (only blocks THIS worker)45 let result = 0;46 for (let i = 0; i < 1e7; i++) result += Math.sqrt(i);47 res.json({ result, pid: process.pid });48 });4950 app.listen(3000, () => {51 console.log(`Worker ${process.pid} listening on port 3000`);52 });53}5455// 2. PM2 ecosystem file (ecosystem.config.js)56const pm2Config = {57 apps: [58 {59 name: "my-api",60 script: "src/server.js",61 instances: "max", // Use all CPU cores62 exec_mode: "cluster", // Cluster mode63 max_memory_restart: "500M",64 env: {65 NODE_ENV: "production",66 PORT: 3000,67 },68 env_development: {69 NODE_ENV: "development",70 PORT: 3000,71 },72 // Logging73 log_file: "./logs/combined.log",74 error_file: "./logs/error.log",75 merge_logs: true,76 log_date_format: "YYYY-MM-DD HH:mm:ss",77 // Auto-restart78 watch: false,79 max_restarts: 10,80 restart_delay: 4000,81 // Graceful shutdown82 kill_timeout: 5000,83 listen_timeout: 10000,84 },85 ],86};8788// 3. nginx load balancer configuration (reference)89const nginxConfig = `90# /etc/nginx/sites-available/my-api91upstream nodejs_cluster {92 least_conn; # Least connections algorithm93 server 127.0.0.1:3001; # Node instance 194 server 127.0.0.1:3002; # Node instance 295 server 127.0.0.1:3003; # Node instance 396 server 127.0.0.1:3004; # Node instance 497 keepalive 64; # Connection pooling98}99100server {101 listen 80;102 listen 443 ssl;103 server_name api.example.com;104105 # SSL106 ssl_certificate /etc/ssl/cert.pem;107 ssl_certificate_key /etc/ssl/key.pem;108109 # Proxy to Node.js cluster110 location / {111 proxy_pass http://nodejs_cluster;112 proxy_http_version 1.1;113 proxy_set_header Upgrade $http_upgrade;114 proxy_set_header Connection 'upgrade';115 proxy_set_header Host $host;116 proxy_set_header X-Real-IP $remote_addr;117 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;118 proxy_cache_bypass $http_upgrade;119120 # Timeouts121 proxy_connect_timeout 60s;122 proxy_send_timeout 60s;123 proxy_read_timeout 60s;124 }125126 # Serve static files directly (bypass Node.js)127 location /static/ {128 alias /var/www/static/;129 expires 1y;130 add_header Cache-Control "public, immutable";131 }132133 # Gzip compression134 gzip on;135 gzip_types text/plain application/json application/javascript text/css;136}137`;138139module.exports = pm2Config;
🏋️ Practice Exercise
Exercises:
- Implement clustering using the
clustermodule — fork workers for each CPU core with auto-restart - Set up PM2 with an ecosystem file — configure cluster mode, log files, and memory restart limits
- Configure nginx as a reverse proxy / load balancer for multiple Node.js instances
- Implement zero-downtime deployment using PM2's
reloadcommand - Load test a single-instance server vs. clustered server — compare throughput and response times
- Build health check endpoints that report per-worker statistics
⚠️ Common Mistakes
Storing session state in process memory with clustering — each worker has its own memory; use Redis or a database for shared state
Not implementing graceful shutdown — when restarting workers, allow in-flight requests to complete before killing the process
Using
cluster.fork()without auto-restart — if a worker crashes without restart logic, your capacity degrades over timeRunning more workers than CPU cores — this causes context switching overhead; match workers to cores (or use PM2's
maxsetting)Not putting nginx in front of Node.js in production — nginx handles SSL termination, static files, gzip, rate limiting, and DDoS protection more efficiently
💼 Interview Questions
🎤 Mock Interview
Mock interview is powered by AI for Clustering & Horizontal Scaling. Login to unlock this feature.