Design a Chat System (WhatsApp/Slack)

0/4 in this phase0/45 across the roadmap

📖 Concept

A real-time chat system is a popular interview question that tests your understanding of WebSockets, message delivery guarantees, and distributed systems.

Requirements

Functional: 1-on-1 messaging, group chats (up to 500 members), online/offline status, read receipts, media sharing, message history Non-Functional: 500M DAU, ~50B messages/day, < 100ms message delivery, message ordering within conversation, eventual delivery (offline queue)

Key Architecture Decisions

Connection Protocol

WebSocket: Persistent bidirectional connection. Client ↔ Server. Best for real-time messaging. Long polling: Fallback for environments where WebSocket isn't available.

Message Flow

  1. Sender sends message via WebSocket → Chat Server
  2. Chat Server stores message in DB (Cassandra — write-heavy)
  3. Chat Server checks if recipient is online
  4. Online: Push message via recipient's WebSocket connection
  5. Offline: Store in offline message queue, deliver when they reconnect

Message Ordering

Use per-conversation sequence numbers. Server assigns monotonically increasing sequence IDs per conversation. Client reorders by sequence ID.

Group Chat Fan-Out

When a message is sent to a group of 500:

  • Push message to all 500 members' connections
  • For offline members, queue for later delivery
  • Use message queues (Kafka) for fan-out to prevent blocking

Data Model

  • messages: {messageId, conversationId, senderId, content, type, sequenceNum, createdAt}
  • conversations: {conversationId, type (1-on-1/group), memberIds}
  • Partition by conversationId for data locality

Interview tip: Always mention WebSockets for real-time chat. Discuss message ordering and delivery guarantees — these are where most candidates struggle.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Chat System — Core Architecture
3// ============================================
4
5class ChatServer {
6 constructor(db, cache, messageQueue) {
7 this.db = db;
8 this.cache = cache;
9 this.queue = messageQueue;
10 this.connections = new Map(); // userId → WebSocket
11 }
12
13 // Handle new WebSocket connection
14 onConnect(userId, ws) {
15 this.connections.set(userId, ws);
16 this.setOnline(userId);
17 this.deliverQueuedMessages(userId);
18 }
19
20 onDisconnect(userId) {
21 this.connections.delete(userId);
22 this.setOffline(userId);
23 }
24
25 async sendMessage(senderId, conversationId, content) {
26 // 1. Generate sequence number (per conversation)
27 const seqNum = await this.cache.incr(`seq:\${conversationId}`);
28
29 // 2. Create message
30 const message = {
31 id: this.generateId(),
32 conversationId,
33 senderId,
34 content,
35 sequenceNum: seqNum,
36 createdAt: Date.now(),
37 status: 'sent',
38 };
39
40 // 3. Persist to database
41 await this.db.insert('messages', message);
42
43 // 4. Deliver to recipients
44 const members = await this.getConversationMembers(conversationId);
45 for (const memberId of members) {
46 if (memberId === senderId) continue;
47
48 const ws = this.connections.get(memberId);
49 if (ws) {
50 // Online: push via WebSocket
51 ws.send(JSON.stringify({ type: 'new_message', message }));
52 } else {
53 // Offline: queue for later delivery
54 await this.cache.rpush(`offline:\${memberId}`, JSON.stringify(message));
55 }
56 }
57
58 return message;
59 }
60
61 async deliverQueuedMessages(userId) {
62 const messages = await this.cache.lrange(`offline:\${userId}`, 0, -1);
63 if (messages.length > 0) {
64 const ws = this.connections.get(userId);
65 ws.send(JSON.stringify({ type: 'queued_messages', messages: messages.map(JSON.parse) }));
66 await this.cache.del(`offline:\${userId}`);
67 }
68 }
69
70 async setOnline(userId) { await this.cache.set(`online:\${userId}`, '1', 'EX', 300); }
71 async setOffline(userId) { await this.cache.del(`online:\${userId}`); }
72
73 async getConversationMembers(convId) {
74 return this.db.query('SELECT member_id FROM conversation_members WHERE conversation_id = $1', [convId]);
75 }
76
77 generateId() { return Date.now().toString(36) + Math.random().toString(36).slice(2); }
78}
79
80console.log("Chat system architecture demonstrated.");

🏋️ Practice Exercise

  1. Full Chat Design: Design WhatsApp. Include: 1-on-1 messaging, group chats, media sharing, online presence, read receipts, end-to-end encryption, and message search.

  2. Message Ordering: How do you ensure messages appear in the correct order when: (a) two messages sent simultaneously, (b) network delays reorder packets, (c) messages sent from multiple devices?

  3. Group Scaling: A group has 100K members. When someone sends a message, how do you fan out to 100K recipients efficiently? What's the max group size before the architecture breaks?

  4. Presence System: Design the "online/offline/last seen" feature for 500M users. How do you track this efficiently without overloading the system with heartbeats?

⚠️ Common Mistakes

  • Using HTTP polling instead of WebSockets — polling creates enormous server load and adds 3-10 second delays. WebSockets provide instant delivery with minimal overhead.

  • Not handling message reordering — network delays can deliver messages out of order. Use sequence numbers per conversation and reorder on the client.

  • Storing messages in a relational database — chat messages are write-heavy and time-series. Use Cassandra or a similar write-optimized store, partitioned by conversationId.

  • Not designing for offline delivery — users go offline frequently on mobile. Queue messages and deliver all of them when the user reconnects.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Design a Chat System (WhatsApp/Slack)