Distributed Consensus & Leader Election
📖 Concept
Distributed consensus is how multiple nodes agree on a value even when some nodes fail or messages are lost. It's the foundation of reliable distributed systems.
Why Consensus Is Hard
In a distributed system, you can't rely on a single coordinator — it could crash. You need multiple nodes to agree, but they communicate over unreliable networks (messages can be delayed, duplicated, or lost).
Key Algorithms
Raft (Most Practical)
Designed for understandability. Used by etcd, CockroachDB, TiKV.
- Leader election: One node is elected leader; it handles all writes
- Log replication: Leader replicates writes to followers; committed when majority acknowledge
- Safety: A new leader is elected if the current one fails (within seconds)
Paxos (Theoretical Foundation)
The original consensus algorithm by Leslie Lamport. Proven correct but notoriously hard to implement. Most systems use Raft instead.
ZAB (ZooKeeper Atomic Broadcast)
Used by Apache ZooKeeper. Similar to Raft but designed specifically for ZooKeeper's use case: configuration management and service discovery.
Leader Election Use Cases
- Database primary selection: When the primary fails, replicas elect a new primary
- Distributed locks: Only the leader can perform a specific action
- Job scheduling: One node is the "scheduler" that assigns work
- Config management: ZooKeeper/etcd provide consistent configuration
Split-Brain Problem
If network partitions a cluster into two groups, each group might elect its own leader → two leaders making conflicting decisions. Solution: quorum — a leader must have votes from majority (> N/2).
Interview tip: You rarely implement consensus yourself. Know when to mention ZooKeeper, etcd, or Raft, and understand why they exist.
💻 Code Example
1// ============================================2// Distributed Consensus — Simplified Raft3// ============================================45class RaftNode {6 constructor(id, peers) {7 this.id = id;8 this.peers = peers;9 this.state = 'FOLLOWER'; // FOLLOWER | CANDIDATE | LEADER10 this.currentTerm = 0;11 this.votedFor = null;12 this.log = [];13 this.commitIndex = 0;14 this.electionTimeout = 150 + Math.random() * 150; // 150-300ms15 this.lastHeartbeat = Date.now();16 }1718 // If no heartbeat received, start election19 checkElectionTimeout() {20 if (this.state === 'LEADER') return;21 if (Date.now() - this.lastHeartbeat > this.electionTimeout) {22 this.startElection();23 }24 }2526 startElection() {27 this.state = 'CANDIDATE';28 this.currentTerm++;29 this.votedFor = this.id;30 let votes = 1; // Vote for self31 console.log(`Node \${this.id}: Starting election for term \${this.currentTerm}`);3233 for (const peer of this.peers) {34 const voteGranted = peer.requestVote(this.currentTerm, this.id, this.log.length);35 if (voteGranted) votes++;36 }3738 const majority = Math.floor((this.peers.length + 1) / 2) + 1;39 if (votes >= majority) {40 this.state = 'LEADER';41 console.log(`Node \${this.id}: Elected LEADER for term \${this.currentTerm}`);42 this.sendHeartbeats();43 } else {44 this.state = 'FOLLOWER';45 }46 }4748 requestVote(term, candidateId, lastLogIndex) {49 if (term > this.currentTerm && (this.votedFor === null || this.votedFor === candidateId)) {50 this.currentTerm = term;51 this.votedFor = candidateId;52 return true;53 }54 return false;55 }5657 appendEntry(term, entry) {58 if (term >= this.currentTerm) {59 this.lastHeartbeat = Date.now();60 this.state = 'FOLLOWER';61 this.log.push(entry);62 return true;63 }64 return false;65 }6667 // Leader sends heartbeats to prevent new elections68 sendHeartbeats() {69 if (this.state !== 'LEADER') return;70 for (const peer of this.peers) {71 peer.appendEntry(this.currentTerm, { type: 'heartbeat' });72 }73 }74}7576// Demo77const node1 = new RaftNode('node-1', []);78const node2 = new RaftNode('node-2', []);79const node3 = new RaftNode('node-3', []);80node1.peers = [node2, node3];81node2.peers = [node1, node3];82node3.peers = [node1, node2];8384// Simulate election timeout on node 185node1.startElection();86console.log(`Node 1: \${node1.state}, Node 2: \${node2.state}, Node 3: \${node3.state}`);
🏋️ Practice Exercise
Raft Simulation: Implement a full Raft simulation with 5 nodes. Test: leader election, log replication, and leader failure recovery.
Split-Brain Scenario: 5-node cluster splits into 2 groups (3 and 2). Walk through what happens with Raft consensus. Does each group elect a leader? Can both accept writes?
ZooKeeper vs etcd: Compare ZooKeeper and etcd for service discovery. When would you choose each?
Leader Election with Redis: Implement a leader election mechanism using Redis SETNX for a job scheduler where only one node should assign jobs.
⚠️ Common Mistakes
Implementing your own consensus algorithm — consensus is extremely hard to get right. Use battle-tested implementations (etcd, ZooKeeper) instead of rolling your own.
Not handling split-brain — without quorum checks, network partitions can lead to two leaders making conflicting decisions. Always require majority votes.
Confusing strong consistency with consensus — consensus is how nodes agree; consistency is the guarantee users see. You can have consensus internally but expose eventual consistency to users.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Distributed Consensus & Leader Election