Apache Kafka & Event Streaming
📖 Concept
Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data pipelines. It's not just a message queue — it's a distributed commit log.
Kafka vs Traditional Message Queues
| Feature | Kafka | RabbitMQ/SQS |
|---|---|---|
| Model | Distributed log | Message queue |
| Retention | Configurable (days/forever) | Until consumed |
| Replay | Can replay from any offset | Messages deleted after consumption |
| Throughput | Millions of messages/sec | Thousands/sec |
| Consumer groups | Multiple groups read independently | Each message consumed once |
| Ordering | Guaranteed within partition | Not guaranteed (mostly) |
Core Concepts
- Topic: A named stream of events (like a database table for events)
- Partition: Topics are split into partitions for parallelism
- Offset: Sequential ID for each message within a partition
- Producer: Writes events to topics
- Consumer Group: A group of consumers that share the work of reading a topic
- Broker: A Kafka server; cluster has multiple brokers for fault tolerance
- Replication Factor: How many copies of each partition exist
When to Use Kafka
- Event sourcing: Store every event as the source of truth
- Real-time analytics: Stream processing (counts, aggregations)
- Log aggregation: Centralize logs from hundreds of services
- Change Data Capture (CDC): Replicate database changes to other systems
- Microservice communication: Decouple services with events
Key insight: Kafka retains messages. Even after consumption, messages stay for the configured retention period. This means you can replay events, add new consumers that read historical data, and debug production issues by re-processing events.
💻 Code Example
1// ============================================2// Kafka Concepts — Simplified Demonstration3// ============================================45// ---------- Kafka-like Event Log ----------6class EventLog {7 constructor(partitionCount = 3) {8 this.partitions = Array.from({ length: partitionCount }, () => []);9 this.consumerGroups = new Map();10 }1112 // Produce: write to a partition (determined by key hash)13 produce(topic, key, value) {14 const partitionIndex = this.getPartition(key);15 const offset = this.partitions[partitionIndex].length;16 const event = { offset, key, value, timestamp: Date.now(), topic };17 this.partitions[partitionIndex].push(event);18 console.log(`Produced to partition \${partitionIndex}, offset \${offset}`);19 return { partition: partitionIndex, offset };20 }2122 // Consumer group: each partition assigned to one consumer in group23 consume(groupId, handler) {24 if (!this.consumerGroups.has(groupId)) {25 this.consumerGroups.set(groupId, new Map()); // partition → offset26 }27 const offsets = this.consumerGroups.get(groupId);2829 for (let p = 0; p < this.partitions.length; p++) {30 const startOffset = offsets.get(p) || 0;31 const partition = this.partitions[p];3233 for (let i = startOffset; i < partition.length; i++) {34 handler(partition[i]);35 offsets.set(p, i + 1); // Commit offset36 }37 }38 }3940 // Same key always goes to same partition (ordering guarantee)41 getPartition(key) {42 let hash = 0;43 const str = String(key);44 for (let i = 0; i < str.length; i++) {45 hash = (hash * 31 + str.charCodeAt(i)) % this.partitions.length;46 }47 return Math.abs(hash);48 }49}5051// ---------- Event-Driven Architecture with Kafka ----------52class OrderService {53 constructor(kafka) { this.kafka = kafka; }5455 async createOrder(order) {56 // Write to DB, then publish event57 const savedOrder = await this.saveToDb(order);58 this.kafka.produce('orders', order.userId, {59 type: 'ORDER_CREATED',60 orderId: savedOrder.id,61 items: order.items,62 total: order.total,63 userId: order.userId,64 });65 return savedOrder;66 }67 async saveToDb(order) { return { ...order, id: 'ord_' + Date.now() }; }68}6970// Each service consumes independently71class InventoryService {72 constructor(kafka) {73 kafka.consume('inventory-service', (event) => {74 if (event.value.type === 'ORDER_CREATED') {75 console.log(`[Inventory] Reserving items for \${event.value.orderId}`);76 }77 });78 }79}8081class NotificationService {82 constructor(kafka) {83 kafka.consume('notification-service', (event) => {84 if (event.value.type === 'ORDER_CREATED') {85 console.log(`[Notification] Sending email for \${event.value.orderId}`);86 }87 });88 }89}9091// Demo92const kafka = new EventLog(3);93const orderSvc = new OrderService(kafka);94orderSvc.createOrder({ userId: 'user_1', items: ['item_a'], total: 99.99 });95orderSvc.createOrder({ userId: 'user_1', items: ['item_b'], total: 49.99 });96new InventoryService(kafka);97new NotificationService(kafka);
🏋️ Practice Exercise
Kafka Architecture: Design a Kafka cluster for an e-commerce platform handling 500K orders/day. How many brokers, partitions per topic, and replication factor?
Event-Driven System: Convert a monolithic order processing system into event-driven using Kafka. Identify all events, topics, producers, and consumer groups.
Kafka vs RabbitMQ: For each scenario, choose Kafka or RabbitMQ: (a) Order processing queue, (b) Real-time clickstream analytics, (c) Email sending, (d) Database change replication.
Partition Strategy: A chat application uses Kafka for messages. What partition key ensures messages within a conversation are ordered? What about messages for a user across all conversations?
⚠️ Common Mistakes
Using Kafka for simple task queues — Kafka's overhead (ZooKeeper, brokers, partitions) is overkill for simple job queues. Use RabbitMQ or SQS for basic task distribution.
Too few partitions — partitions determine consumer parallelism. With 3 partitions, max 3 consumers per group. Plan for future scale: start with 10-50 partitions per topic.
Not thinking about partition keys — random partitioning loses message ordering. Use meaningful keys (userId, orderId) to keep related events in the same partition.
Treating Kafka as a database — Kafka retains messages but isn't optimized for random access queries. Use it as an event log, not a primary data store.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Apache Kafka & Event Streaming