Data Modeling Patterns

0/6 in this phase0/45 across the roadmap

📖 Concept

Data modeling is the art of structuring your data to match your application's access patterns. A well-designed data model makes queries fast and simple; a poorly designed one leads to complex queries, slow performance, and scaling headaches.

Normalization vs Denormalization

Normalization (Eliminating Redundancy)

Goal: Each piece of data is stored in one place only. Related data is connected through foreign keys and JOINs.

Normal Forms (simplified):

  • 1NF: No repeating groups, atomic values
  • 2NF: 1NF + no partial dependencies
  • 3NF: 2NF + no transitive dependencies

Denormalization (Strategic Redundancy)

Goal: Duplicate data to avoid expensive JOINs at read time.

Pattern Normalized Denormalized
User's post count SELECT COUNT(*) FROM posts WHERE user_id = ? users.post_count (pre-computed column)
Post with author JOIN posts + users Store author_name directly in posts table
Order total SUM across order_items Store total in orders table

Common Data Modeling Patterns

1. Star Schema (Analytics)

Central fact table (events, transactions) surrounded by dimension tables (users, products, time). Optimized for OLAP (analytical queries).

2. Adjacency List (Hierarchies)

Each row has a parent_id pointing to its parent. Simple but recursive queries are slow.

3. Materialized Path (Hierarchies)

Store the full path: /root/parent/child. Fast reads but updating a node's position requires updating all descendants.

4. Event Sourcing

Instead of storing current state, store a log of all events. Current state is computed by replaying events. Used in financial systems and CQRS architectures.

5. CQRS (Command Query Responsibility Segregation)

Separate the write model (optimized for commands/updates) from the read model (optimized for queries). Different databases can serve each model.

Pro tip: In interviews, explain your data model choices — "I denormalized the post author name because our read:write ratio is 100:1 and this eliminates a JOIN on every feed query."

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Data Modeling Patterns — Practical Examples
3// ============================================
4
5// ---------- Pattern 1: Denormalization for Performance ----------
6
7// ❌ NORMALIZED: Post with author info (requires JOIN every read)
8const normalizedSchema = `
9 CREATE TABLE users (
10 id SERIAL PRIMARY KEY,
11 name VARCHAR(100),
12 avatar VARCHAR(255)
13 );
14
15 CREATE TABLE posts (
16 id SERIAL PRIMARY KEY,
17 user_id INTEGER REFERENCES users(id),
18 content TEXT,
19 created_at TIMESTAMP
20 );
21
22 -- Every feed query needs a JOIN:
23 SELECT p.*, u.name, u.avatar
24 FROM posts p
25 JOIN users u ON u.id = p.user_id
26 ORDER BY p.created_at DESC LIMIT 20;
27`;
28
29// ✅ DENORMALIZED: Author info embedded in posts table
30const denormalizedSchema = `
31 CREATE TABLE posts (
32 id SERIAL PRIMARY KEY,
33 user_id INTEGER REFERENCES users(id),
34 author_name VARCHAR(100), -- Denormalized from users
35 author_avatar VARCHAR(255), -- Denormalized from users
36 content TEXT,
37 like_count INTEGER DEFAULT 0, -- Pre-computed counter
38 comment_count INTEGER DEFAULT 0,
39 created_at TIMESTAMP
40 );
41
42 -- Feed query: NO JOIN needed!
43 SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;
44`;
45
46// The cost: when a user changes their name/avatar, you must
47// update ALL their posts. Acceptable if updates are rare
48// and reads are frequent (social media: 100:1 read:write)
49
50// ---------- Pattern 2: CQRS (Separate Read/Write Models) ----------
51
52class CQRSExample {
53 constructor() {
54 this.writeDB = null; // PostgreSQL (normalized, ACID)
55 this.readDB = null; // Elasticsearch or Redis (denormalized, fast)
56 }
57
58 // WRITE path: Normalized, ACID, consistent
59 async createOrder(orderData) {
60 const order = await this.writeDB.transaction(async (tx) => {
61 const order = await tx.insert('orders', {
62 user_id: orderData.userId,
63 status: 'pending',
64 total: orderData.total,
65 });
66
67 for (const item of orderData.items) {
68 await tx.insert('order_items', {
69 order_id: order.id,
70 product_id: item.productId,
71 quantity: item.quantity,
72 price: item.price,
73 });
74
75 // Update inventory
76 await tx.query(
77 'UPDATE products SET stock = stock - $1 WHERE id = $2 AND stock >= $1',
78 [item.quantity, item.productId]
79 );
80 }
81
82 return order;
83 });
84
85 // Publish event for read model to update itself
86 await this.publishEvent('order.created', {
87 orderId: order.id,
88 userId: orderData.userId,
89 items: orderData.items,
90 total: orderData.total,
91 });
92
93 return order;
94 }
95
96 // READ path: Denormalized, fast, eventual consistency
97 async searchOrders(filters) {
98 // Query the read-optimized store (Elasticsearch)
99 return await this.readDB.search({
100 index: 'orders',
101 body: {
102 query: {
103 bool: {
104 must: [
105 { match: { user_id: filters.userId } },
106 { range: { created_at: { gte: filters.startDate } } },
107 ],
108 },
109 },
110 sort: [{ created_at: 'desc' }],
111 },
112 });
113 }
114
115 async publishEvent(type, data) {
116 // In production: Kafka, RabbitMQ, or AWS SNS
117 console.log(`Published event: \${type}`, data);
118 }
119}
120
121// ---------- Pattern 3: Event Sourcing ----------
122
123class EventSourcedAccount {
124 constructor(accountId) {
125 this.accountId = accountId;
126 this.events = []; // Event log
127 this.balance = 0; // Current state (derived from events)
128 }
129
130 // Instead of updating state, we APPEND events
131 deposit(amount) {
132 const event = {
133 type: 'DEPOSITED',
134 accountId: this.accountId,
135 amount,
136 timestamp: Date.now(),
137 };
138 this.events.push(event);
139 this.applyEvent(event);
140 return event;
141 }
142
143 withdraw(amount) {
144 if (this.balance < amount) {
145 throw new Error('Insufficient funds');
146 }
147 const event = {
148 type: 'WITHDRAWN',
149 accountId: this.accountId,
150 amount,
151 timestamp: Date.now(),
152 };
153 this.events.push(event);
154 this.applyEvent(event);
155 return event;
156 }
157
158 // Apply a single event to update state
159 applyEvent(event) {
160 switch (event.type) {
161 case 'DEPOSITED':
162 this.balance += event.amount;
163 break;
164 case 'WITHDRAWN':
165 this.balance -= event.amount;
166 break;
167 }
168 }
169
170 // Rebuild state from event history
171 static fromEvents(accountId, events) {
172 const account = new EventSourcedAccount(accountId);
173 for (const event of events) {
174 account.applyEvent(event);
175 account.events.push(event);
176 }
177 return account;
178 }
179
180 // Get state at any point in time!
181 getBalanceAt(timestamp) {
182 let balance = 0;
183 for (const event of this.events) {
184 if (event.timestamp > timestamp) break;
185 if (event.type === 'DEPOSITED') balance += event.amount;
186 if (event.type === 'WITHDRAWN') balance -= event.amount;
187 }
188 return balance;
189 }
190}
191
192// Demo
193const account = new EventSourcedAccount('acc_123');
194account.deposit(1000);
195account.withdraw(250);
196account.deposit(500);
197console.log('Current balance:', account.balance); // 1250
198console.log('Event history:', account.events.length); // 3 events
199console.log('Full audit trail:', account.events);

🏋️ Practice Exercise

  1. Normalize vs Denormalize: Given an e-commerce database, design both a normalized (3NF) and denormalized schema for the product listing page. Compare query complexity and performance for "Get product with category, reviews summary, and seller info."

  2. CQRS Implementation: Design a CQRS architecture for a social media platform. Define the write model (PostgreSQL), read model (Elasticsearch), and the event pipeline that keeps them in sync.

  3. Event Sourcing Design: Design an event-sourced shopping cart. Define all event types (ItemAdded, ItemRemoved, QuantityChanged, CartCleared, CartCheckedOut). Implement the state reconstruction logic.

  4. Schema Evolution: Your users table needs a new preferences field with complex nested data (notification settings, theme, language). Compare: (a) adding a JSONB column, (b) creating a separate preferences table, (c) using a key-value pattern. Which would you choose?

  5. Time-Series Model: Design the data model for a real-time analytics dashboard tracking page views, clicks, and conversions. Include: raw events table, pre-aggregated summary tables, and the aggregation pipeline.

⚠️ Common Mistakes

  • Over-normalizing in a read-heavy system — forcing JOINs on every read when data could be safely denormalized. If data rarely changes but is read constantly, denormalize it.

  • Denormalizing without a sync strategy — if you duplicate data (user name in posts table), you MUST have a mechanism to update all copies when the source changes. Without it, data becomes permanently inconsistent.

  • Using event sourcing for simple CRUD — event sourcing adds significant complexity (event replay, snapshotting, eventual consistency). Use it only when you need an audit trail, time-travel queries, or complex domain logic.

  • Ignoring the write amplification of denormalization — if a celebrity updates their profile picture and has 10M posts, you now have a massive batch update to run. Design around this with lazy updates or background jobs.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Data Modeling Patterns