Apache Kafka & Event Streaming

0/3 in this phase0/45 across the roadmap

📖 Concept

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data pipelines. It's not just a message queue — it's a distributed commit log.

Kafka vs Traditional Message Queues

Feature Kafka RabbitMQ/SQS
Model Distributed log Message queue
Retention Configurable (days/forever) Until consumed
Replay Can replay from any offset Messages deleted after consumption
Throughput Millions of messages/sec Thousands/sec
Consumer groups Multiple groups read independently Each message consumed once
Ordering Guaranteed within partition Not guaranteed (mostly)

Core Concepts

  • Topic: A named stream of events (like a database table for events)
  • Partition: Topics are split into partitions for parallelism
  • Offset: Sequential ID for each message within a partition
  • Producer: Writes events to topics
  • Consumer Group: A group of consumers that share the work of reading a topic
  • Broker: A Kafka server; cluster has multiple brokers for fault tolerance
  • Replication Factor: How many copies of each partition exist

When to Use Kafka

  • Event sourcing: Store every event as the source of truth
  • Real-time analytics: Stream processing (counts, aggregations)
  • Log aggregation: Centralize logs from hundreds of services
  • Change Data Capture (CDC): Replicate database changes to other systems
  • Microservice communication: Decouple services with events

Key insight: Kafka retains messages. Even after consumption, messages stay for the configured retention period. This means you can replay events, add new consumers that read historical data, and debug production issues by re-processing events.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Kafka Concepts — Simplified Demonstration
3// ============================================
4
5// ---------- Kafka-like Event Log ----------
6class EventLog {
7 constructor(partitionCount = 3) {
8 this.partitions = Array.from({ length: partitionCount }, () => []);
9 this.consumerGroups = new Map();
10 }
11
12 // Produce: write to a partition (determined by key hash)
13 produce(topic, key, value) {
14 const partitionIndex = this.getPartition(key);
15 const offset = this.partitions[partitionIndex].length;
16 const event = { offset, key, value, timestamp: Date.now(), topic };
17 this.partitions[partitionIndex].push(event);
18 console.log(`Produced to partition \${partitionIndex}, offset \${offset}`);
19 return { partition: partitionIndex, offset };
20 }
21
22 // Consumer group: each partition assigned to one consumer in group
23 consume(groupId, handler) {
24 if (!this.consumerGroups.has(groupId)) {
25 this.consumerGroups.set(groupId, new Map()); // partition → offset
26 }
27 const offsets = this.consumerGroups.get(groupId);
28
29 for (let p = 0; p < this.partitions.length; p++) {
30 const startOffset = offsets.get(p) || 0;
31 const partition = this.partitions[p];
32
33 for (let i = startOffset; i < partition.length; i++) {
34 handler(partition[i]);
35 offsets.set(p, i + 1); // Commit offset
36 }
37 }
38 }
39
40 // Same key always goes to same partition (ordering guarantee)
41 getPartition(key) {
42 let hash = 0;
43 const str = String(key);
44 for (let i = 0; i < str.length; i++) {
45 hash = (hash * 31 + str.charCodeAt(i)) % this.partitions.length;
46 }
47 return Math.abs(hash);
48 }
49}
50
51// ---------- Event-Driven Architecture with Kafka ----------
52class OrderService {
53 constructor(kafka) { this.kafka = kafka; }
54
55 async createOrder(order) {
56 // Write to DB, then publish event
57 const savedOrder = await this.saveToDb(order);
58 this.kafka.produce('orders', order.userId, {
59 type: 'ORDER_CREATED',
60 orderId: savedOrder.id,
61 items: order.items,
62 total: order.total,
63 userId: order.userId,
64 });
65 return savedOrder;
66 }
67 async saveToDb(order) { return { ...order, id: 'ord_' + Date.now() }; }
68}
69
70// Each service consumes independently
71class InventoryService {
72 constructor(kafka) {
73 kafka.consume('inventory-service', (event) => {
74 if (event.value.type === 'ORDER_CREATED') {
75 console.log(`[Inventory] Reserving items for \${event.value.orderId}`);
76 }
77 });
78 }
79}
80
81class NotificationService {
82 constructor(kafka) {
83 kafka.consume('notification-service', (event) => {
84 if (event.value.type === 'ORDER_CREATED') {
85 console.log(`[Notification] Sending email for \${event.value.orderId}`);
86 }
87 });
88 }
89}
90
91// Demo
92const kafka = new EventLog(3);
93const orderSvc = new OrderService(kafka);
94orderSvc.createOrder({ userId: 'user_1', items: ['item_a'], total: 99.99 });
95orderSvc.createOrder({ userId: 'user_1', items: ['item_b'], total: 49.99 });
96new InventoryService(kafka);
97new NotificationService(kafka);

🏋️ Practice Exercise

  1. Kafka Architecture: Design a Kafka cluster for an e-commerce platform handling 500K orders/day. How many brokers, partitions per topic, and replication factor?

  2. Event-Driven System: Convert a monolithic order processing system into event-driven using Kafka. Identify all events, topics, producers, and consumer groups.

  3. Kafka vs RabbitMQ: For each scenario, choose Kafka or RabbitMQ: (a) Order processing queue, (b) Real-time clickstream analytics, (c) Email sending, (d) Database change replication.

  4. Partition Strategy: A chat application uses Kafka for messages. What partition key ensures messages within a conversation are ordered? What about messages for a user across all conversations?

⚠️ Common Mistakes

  • Using Kafka for simple task queues — Kafka's overhead (ZooKeeper, brokers, partitions) is overkill for simple job queues. Use RabbitMQ or SQS for basic task distribution.

  • Too few partitions — partitions determine consumer parallelism. With 3 partitions, max 3 consumers per group. Plan for future scale: start with 10-50 partitions per topic.

  • Not thinking about partition keys — random partitioning loses message ordering. Use meaningful keys (userId, orderId) to keep related events in the same partition.

  • Treating Kafka as a database — Kafka retains messages but isn't optimized for random access queries. Use it as an event log, not a primary data store.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Apache Kafka & Event Streaming