Distributed Consensus & Leader Election

0/2 in this phase0/45 across the roadmap

📖 Concept

Distributed consensus is how multiple nodes agree on a value even when some nodes fail or messages are lost. It's the foundation of reliable distributed systems.

Why Consensus Is Hard

In a distributed system, you can't rely on a single coordinator — it could crash. You need multiple nodes to agree, but they communicate over unreliable networks (messages can be delayed, duplicated, or lost).

Key Algorithms

Raft (Most Practical)

Designed for understandability. Used by etcd, CockroachDB, TiKV.

  • Leader election: One node is elected leader; it handles all writes
  • Log replication: Leader replicates writes to followers; committed when majority acknowledge
  • Safety: A new leader is elected if the current one fails (within seconds)

Paxos (Theoretical Foundation)

The original consensus algorithm by Leslie Lamport. Proven correct but notoriously hard to implement. Most systems use Raft instead.

ZAB (ZooKeeper Atomic Broadcast)

Used by Apache ZooKeeper. Similar to Raft but designed specifically for ZooKeeper's use case: configuration management and service discovery.

Leader Election Use Cases

  • Database primary selection: When the primary fails, replicas elect a new primary
  • Distributed locks: Only the leader can perform a specific action
  • Job scheduling: One node is the "scheduler" that assigns work
  • Config management: ZooKeeper/etcd provide consistent configuration

Split-Brain Problem

If network partitions a cluster into two groups, each group might elect its own leader → two leaders making conflicting decisions. Solution: quorum — a leader must have votes from majority (> N/2).

Interview tip: You rarely implement consensus yourself. Know when to mention ZooKeeper, etcd, or Raft, and understand why they exist.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Distributed Consensus — Simplified Raft
3// ============================================
4
5class RaftNode {
6 constructor(id, peers) {
7 this.id = id;
8 this.peers = peers;
9 this.state = 'FOLLOWER'; // FOLLOWER | CANDIDATE | LEADER
10 this.currentTerm = 0;
11 this.votedFor = null;
12 this.log = [];
13 this.commitIndex = 0;
14 this.electionTimeout = 150 + Math.random() * 150; // 150-300ms
15 this.lastHeartbeat = Date.now();
16 }
17
18 // If no heartbeat received, start election
19 checkElectionTimeout() {
20 if (this.state === 'LEADER') return;
21 if (Date.now() - this.lastHeartbeat > this.electionTimeout) {
22 this.startElection();
23 }
24 }
25
26 startElection() {
27 this.state = 'CANDIDATE';
28 this.currentTerm++;
29 this.votedFor = this.id;
30 let votes = 1; // Vote for self
31 console.log(`Node \${this.id}: Starting election for term \${this.currentTerm}`);
32
33 for (const peer of this.peers) {
34 const voteGranted = peer.requestVote(this.currentTerm, this.id, this.log.length);
35 if (voteGranted) votes++;
36 }
37
38 const majority = Math.floor((this.peers.length + 1) / 2) + 1;
39 if (votes >= majority) {
40 this.state = 'LEADER';
41 console.log(`Node \${this.id}: Elected LEADER for term \${this.currentTerm}`);
42 this.sendHeartbeats();
43 } else {
44 this.state = 'FOLLOWER';
45 }
46 }
47
48 requestVote(term, candidateId, lastLogIndex) {
49 if (term > this.currentTerm && (this.votedFor === null || this.votedFor === candidateId)) {
50 this.currentTerm = term;
51 this.votedFor = candidateId;
52 return true;
53 }
54 return false;
55 }
56
57 appendEntry(term, entry) {
58 if (term >= this.currentTerm) {
59 this.lastHeartbeat = Date.now();
60 this.state = 'FOLLOWER';
61 this.log.push(entry);
62 return true;
63 }
64 return false;
65 }
66
67 // Leader sends heartbeats to prevent new elections
68 sendHeartbeats() {
69 if (this.state !== 'LEADER') return;
70 for (const peer of this.peers) {
71 peer.appendEntry(this.currentTerm, { type: 'heartbeat' });
72 }
73 }
74}
75
76// Demo
77const node1 = new RaftNode('node-1', []);
78const node2 = new RaftNode('node-2', []);
79const node3 = new RaftNode('node-3', []);
80node1.peers = [node2, node3];
81node2.peers = [node1, node3];
82node3.peers = [node1, node2];
83
84// Simulate election timeout on node 1
85node1.startElection();
86console.log(`Node 1: \${node1.state}, Node 2: \${node2.state}, Node 3: \${node3.state}`);

🏋️ Practice Exercise

  1. Raft Simulation: Implement a full Raft simulation with 5 nodes. Test: leader election, log replication, and leader failure recovery.

  2. Split-Brain Scenario: 5-node cluster splits into 2 groups (3 and 2). Walk through what happens with Raft consensus. Does each group elect a leader? Can both accept writes?

  3. ZooKeeper vs etcd: Compare ZooKeeper and etcd for service discovery. When would you choose each?

  4. Leader Election with Redis: Implement a leader election mechanism using Redis SETNX for a job scheduler where only one node should assign jobs.

⚠️ Common Mistakes

  • Implementing your own consensus algorithm — consensus is extremely hard to get right. Use battle-tested implementations (etcd, ZooKeeper) instead of rolling your own.

  • Not handling split-brain — without quorum checks, network partitions can lead to two leaders making conflicting decisions. Always require majority votes.

  • Confusing strong consistency with consensus — consensus is how nodes agree; consistency is the guarantee users see. You can have consensus internally but expose eventual consistency to users.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Distributed Consensus & Leader Election