Apache Kafka & Event Streaming

0/3 in this phase0/45 across the roadmap

📖 Concept

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data pipelines. It's not just a message queue — it's a distributed commit log.

Kafka vs Traditional Message Queues

Feature	Kafka	RabbitMQ/SQS
Model	Distributed log	Message queue
Retention	Configurable (days/forever)	Until consumed
Replay	Can replay from any offset	Messages deleted after consumption
Throughput	Millions of messages/sec	Thousands/sec
Consumer groups	Multiple groups read independently	Each message consumed once
Ordering	Guaranteed within partition	Not guaranteed (mostly)

Core Concepts

Topic: A named stream of events (like a database table for events)
Partition: Topics are split into partitions for parallelism
Offset: Sequential ID for each message within a partition
Producer: Writes events to topics
Consumer Group: A group of consumers that share the work of reading a topic
Broker: A Kafka server; cluster has multiple brokers for fault tolerance
Replication Factor: How many copies of each partition exist

When to Use Kafka

Event sourcing: Store every event as the source of truth
Real-time analytics: Stream processing (counts, aggregations)
Log aggregation: Centralize logs from hundreds of services
Change Data Capture (CDC): Replicate database changes to other systems
Microservice communication: Decouple services with events

Key insight: Kafka retains messages. Even after consumption, messages stay for the configured retention period. This means you can replay events, add new consumers that read historical data, and debug production issues by re-processing events.

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// Kafka Concepts — Simplified Demonstration
3// ============================================
4
5// ---------- Kafka-like Event Log ----------
6class EventLog {
7  constructor(partitionCount = 3) {
8    this.partitions = Array.from({ length: partitionCount }, () => []);
9    this.consumerGroups = new Map();
10  }
11
12  // Produce: write to a partition (determined by key hash)
13  produce(topic, key, value) {
14    const partitionIndex = this.getPartition(key);
15    const offset = this.partitions[partitionIndex].length;
16    const event = { offset, key, value, timestamp: Date.now(), topic };
17    this.partitions[partitionIndex].push(event);
18    console.log(`Produced to partition \${partitionIndex}, offset \${offset}`);
19    return { partition: partitionIndex, offset };
20  }
21
22  // Consumer group: each partition assigned to one consumer in group
23  consume(groupId, handler) {
24    if (!this.consumerGroups.has(groupId)) {
25      this.consumerGroups.set(groupId, new Map()); // partition → offset
26    }
27    const offsets = this.consumerGroups.get(groupId);
28
29    for (let p = 0; p < this.partitions.length; p++) {
30      const startOffset = offsets.get(p) || 0;
31      const partition = this.partitions[p];
32
33      for (let i = startOffset; i < partition.length; i++) {
34        handler(partition[i]);
35        offsets.set(p, i + 1); // Commit offset
36      }
37    }
38  }
39
40  // Same key always goes to same partition (ordering guarantee)
41  getPartition(key) {
42    let hash = 0;
43    const str = String(key);
44    for (let i = 0; i < str.length; i++) {
45      hash = (hash * 31 + str.charCodeAt(i)) % this.partitions.length;
46    }
47    return Math.abs(hash);
48  }
49}
50
51// ---------- Event-Driven Architecture with Kafka ----------
52class OrderService {
53  constructor(kafka) { this.kafka = kafka; }
54
55  async createOrder(order) {
56    // Write to DB, then publish event
57    const savedOrder = await this.saveToDb(order);
58    this.kafka.produce('orders', order.userId, {
59      type: 'ORDER_CREATED',
60      orderId: savedOrder.id,
61      items: order.items,
62      total: order.total,
63      userId: order.userId,
64    });
65    return savedOrder;
66  }
67  async saveToDb(order) { return { ...order, id: 'ord_' + Date.now() }; }
68}
69
70// Each service consumes independently
71class InventoryService {
72  constructor(kafka) {
73    kafka.consume('inventory-service', (event) => {
74      if (event.value.type === 'ORDER_CREATED') {
75        console.log(`[Inventory] Reserving items for \${event.value.orderId}`);
76      }
77    });
78  }
79}
80
81class NotificationService {
82  constructor(kafka) {
83    kafka.consume('notification-service', (event) => {
84      if (event.value.type === 'ORDER_CREATED') {
85        console.log(`[Notification] Sending email for \${event.value.orderId}`);
86      }
87    });
88  }
89}
90
91// Demo
92const kafka = new EventLog(3);
93const orderSvc = new OrderService(kafka);
94orderSvc.createOrder({ userId: 'user_1', items: ['item_a'], total: 99.99 });
95orderSvc.createOrder({ userId: 'user_1', items: ['item_b'], total: 49.99 });
96new InventoryService(kafka);
97new NotificationService(kafka);

🏋️ Practice Exercise

Kafka Architecture: Design a Kafka cluster for an e-commerce platform handling 500K orders/day. How many brokers, partitions per topic, and replication factor?
Event-Driven System: Convert a monolithic order processing system into event-driven using Kafka. Identify all events, topics, producers, and consumer groups.
Kafka vs RabbitMQ: For each scenario, choose Kafka or RabbitMQ: (a) Order processing queue, (b) Real-time clickstream analytics, (c) Email sending, (d) Database change replication.
Partition Strategy: A chat application uses Kafka for messages. What partition key ensures messages within a conversation are ordered? What about messages for a user across all conversations?

⚠️ Common Mistakes

Using Kafka for simple task queues — Kafka's overhead (ZooKeeper, brokers, partitions) is overkill for simple job queues. Use RabbitMQ or SQS for basic task distribution.
Too few partitions — partitions determine consumer parallelism. With 3 partitions, max 3 consumers per group. Plan for future scale: start with 10-50 partitions per topic.
Not thinking about partition keys — random partitioning loses message ordering. Use meaningful keys (userId, orderId) to keep related events in the same partition.
Treating Kafka as a database — Kafka retains messages but isn't optimized for random access queries. Use it as an event log, not a primary data store.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Apache Kafka & Event Streaming

Was this topic helpful?

← PreviousMessage Queue Fundamentals Next →Event-Driven Architecture