Design a Chat System (WhatsApp/Slack)
📖 Concept
A real-time chat system is a popular interview question that tests your understanding of WebSockets, message delivery guarantees, and distributed systems.
Requirements
Functional: 1-on-1 messaging, group chats (up to 500 members), online/offline status, read receipts, media sharing, message history Non-Functional: 500M DAU, ~50B messages/day, < 100ms message delivery, message ordering within conversation, eventual delivery (offline queue)
Key Architecture Decisions
Connection Protocol
WebSocket: Persistent bidirectional connection. Client ↔ Server. Best for real-time messaging. Long polling: Fallback for environments where WebSocket isn't available.
Message Flow
- Sender sends message via WebSocket → Chat Server
- Chat Server stores message in DB (Cassandra — write-heavy)
- Chat Server checks if recipient is online
- Online: Push message via recipient's WebSocket connection
- Offline: Store in offline message queue, deliver when they reconnect
Message Ordering
Use per-conversation sequence numbers. Server assigns monotonically increasing sequence IDs per conversation. Client reorders by sequence ID.
Group Chat Fan-Out
When a message is sent to a group of 500:
- Push message to all 500 members' connections
- For offline members, queue for later delivery
- Use message queues (Kafka) for fan-out to prevent blocking
Data Model
messages: {messageId, conversationId, senderId, content, type, sequenceNum, createdAt}conversations: {conversationId, type (1-on-1/group), memberIds}- Partition by conversationId for data locality
Interview tip: Always mention WebSockets for real-time chat. Discuss message ordering and delivery guarantees — these are where most candidates struggle.
💻 Code Example
1// ============================================2// Chat System — Core Architecture3// ============================================45class ChatServer {6 constructor(db, cache, messageQueue) {7 this.db = db;8 this.cache = cache;9 this.queue = messageQueue;10 this.connections = new Map(); // userId → WebSocket11 }1213 // Handle new WebSocket connection14 onConnect(userId, ws) {15 this.connections.set(userId, ws);16 this.setOnline(userId);17 this.deliverQueuedMessages(userId);18 }1920 onDisconnect(userId) {21 this.connections.delete(userId);22 this.setOffline(userId);23 }2425 async sendMessage(senderId, conversationId, content) {26 // 1. Generate sequence number (per conversation)27 const seqNum = await this.cache.incr(`seq:\${conversationId}`);2829 // 2. Create message30 const message = {31 id: this.generateId(),32 conversationId,33 senderId,34 content,35 sequenceNum: seqNum,36 createdAt: Date.now(),37 status: 'sent',38 };3940 // 3. Persist to database41 await this.db.insert('messages', message);4243 // 4. Deliver to recipients44 const members = await this.getConversationMembers(conversationId);45 for (const memberId of members) {46 if (memberId === senderId) continue;4748 const ws = this.connections.get(memberId);49 if (ws) {50 // Online: push via WebSocket51 ws.send(JSON.stringify({ type: 'new_message', message }));52 } else {53 // Offline: queue for later delivery54 await this.cache.rpush(`offline:\${memberId}`, JSON.stringify(message));55 }56 }5758 return message;59 }6061 async deliverQueuedMessages(userId) {62 const messages = await this.cache.lrange(`offline:\${userId}`, 0, -1);63 if (messages.length > 0) {64 const ws = this.connections.get(userId);65 ws.send(JSON.stringify({ type: 'queued_messages', messages: messages.map(JSON.parse) }));66 await this.cache.del(`offline:\${userId}`);67 }68 }6970 async setOnline(userId) { await this.cache.set(`online:\${userId}`, '1', 'EX', 300); }71 async setOffline(userId) { await this.cache.del(`online:\${userId}`); }7273 async getConversationMembers(convId) {74 return this.db.query('SELECT member_id FROM conversation_members WHERE conversation_id = $1', [convId]);75 }7677 generateId() { return Date.now().toString(36) + Math.random().toString(36).slice(2); }78}7980console.log("Chat system architecture demonstrated.");
🏋️ Practice Exercise
Full Chat Design: Design WhatsApp. Include: 1-on-1 messaging, group chats, media sharing, online presence, read receipts, end-to-end encryption, and message search.
Message Ordering: How do you ensure messages appear in the correct order when: (a) two messages sent simultaneously, (b) network delays reorder packets, (c) messages sent from multiple devices?
Group Scaling: A group has 100K members. When someone sends a message, how do you fan out to 100K recipients efficiently? What's the max group size before the architecture breaks?
Presence System: Design the "online/offline/last seen" feature for 500M users. How do you track this efficiently without overloading the system with heartbeats?
⚠️ Common Mistakes
Using HTTP polling instead of WebSockets — polling creates enormous server load and adds 3-10 second delays. WebSockets provide instant delivery with minimal overhead.
Not handling message reordering — network delays can deliver messages out of order. Use sequence numbers per conversation and reorder on the client.
Storing messages in a relational database — chat messages are write-heavy and time-series. Use Cassandra or a similar write-optimized store, partitioned by conversationId.
Not designing for offline delivery — users go offline frequently on mobile. Queue messages and deliver all of them when the user reconnects.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Design a Chat System (WhatsApp/Slack)