Data Modeling Patterns

0/6 in this phase0/45 across the roadmap

📖 Concept

Data modeling is the art of structuring your data to match your application's access patterns. A well-designed data model makes queries fast and simple; a poorly designed one leads to complex queries, slow performance, and scaling headaches.

Normalization vs Denormalization

Normalization (Eliminating Redundancy)

Goal: Each piece of data is stored in one place only. Related data is connected through foreign keys and JOINs.

Normal Forms (simplified):

1NF: No repeating groups, atomic values
2NF: 1NF + no partial dependencies
3NF: 2NF + no transitive dependencies

Denormalization (Strategic Redundancy)

Goal: Duplicate data to avoid expensive JOINs at read time.

Pattern	Normalized	Denormalized
User's post count	`SELECT COUNT(*) FROM posts WHERE user_id = ?`	`users.post_count` (pre-computed column)
Post with author	JOIN posts + users	Store `author_name` directly in posts table
Order total	SUM across order_items	Store `total` in orders table

Common Data Modeling Patterns

1. Star Schema (Analytics)

Central fact table (events, transactions) surrounded by dimension tables (users, products, time). Optimized for OLAP (analytical queries).

2. Adjacency List (Hierarchies)

Each row has a parent_id pointing to its parent. Simple but recursive queries are slow.

3. Materialized Path (Hierarchies)

Store the full path: /root/parent/child. Fast reads but updating a node's position requires updating all descendants.

4. Event Sourcing

Instead of storing current state, store a log of all events. Current state is computed by replaying events. Used in financial systems and CQRS architectures.

5. CQRS (Command Query Responsibility Segregation)

Separate the write model (optimized for commands/updates) from the read model (optimized for queries). Different databases can serve each model.

Pro tip: In interviews, explain your data model choices — "I denormalized the post author name because our read:write ratio is 100:1 and this eliminates a JOIN on every feed query."

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// Data Modeling Patterns — Practical Examples
3// ============================================
4
5// ---------- Pattern 1: Denormalization for Performance ----------
6
7// ❌ NORMALIZED: Post with author info (requires JOIN every read)
8const normalizedSchema = `
9  CREATE TABLE users (
10    id SERIAL PRIMARY KEY,
11    name VARCHAR(100),
12    avatar VARCHAR(255)
13  );
14
15  CREATE TABLE posts (
16    id SERIAL PRIMARY KEY,
17    user_id INTEGER REFERENCES users(id),
18    content TEXT,
19    created_at TIMESTAMP
20  );
21
22  -- Every feed query needs a JOIN:
23  SELECT p.*, u.name, u.avatar
24  FROM posts p
25  JOIN users u ON u.id = p.user_id
26  ORDER BY p.created_at DESC LIMIT 20;
27`;
28
29// ✅ DENORMALIZED: Author info embedded in posts table
30const denormalizedSchema = `
31  CREATE TABLE posts (
32    id SERIAL PRIMARY KEY,
33    user_id INTEGER REFERENCES users(id),
34    author_name VARCHAR(100),    -- Denormalized from users
35    author_avatar VARCHAR(255),  -- Denormalized from users
36    content TEXT,
37    like_count INTEGER DEFAULT 0, -- Pre-computed counter
38    comment_count INTEGER DEFAULT 0,
39    created_at TIMESTAMP
40  );
41
42  -- Feed query: NO JOIN needed!
43  SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;
44`;
45
46// The cost: when a user changes their name/avatar, you must
47// update ALL their posts. Acceptable if updates are rare
48// and reads are frequent (social media: 100:1 read:write)
49
50// ---------- Pattern 2: CQRS (Separate Read/Write Models) ----------
51
52class CQRSExample {
53  constructor() {
54    this.writeDB = null;  // PostgreSQL (normalized, ACID)
55    this.readDB = null;   // Elasticsearch or Redis (denormalized, fast)
56  }
57
58  // WRITE path: Normalized, ACID, consistent
59  async createOrder(orderData) {
60    const order = await this.writeDB.transaction(async (tx) => {
61      const order = await tx.insert('orders', {
62        user_id: orderData.userId,
63        status: 'pending',
64        total: orderData.total,
65      });
66
67      for (const item of orderData.items) {
68        await tx.insert('order_items', {
69          order_id: order.id,
70          product_id: item.productId,
71          quantity: item.quantity,
72          price: item.price,
73        });
74
75        // Update inventory
76        await tx.query(
77          'UPDATE products SET stock = stock - $1 WHERE id = $2 AND stock >= $1',
78          [item.quantity, item.productId]
79        );
80      }
81
82      return order;
83    });
84
85    // Publish event for read model to update itself
86    await this.publishEvent('order.created', {
87      orderId: order.id,
88      userId: orderData.userId,
89      items: orderData.items,
90      total: orderData.total,
91    });
92
93    return order;
94  }
95
96  // READ path: Denormalized, fast, eventual consistency
97  async searchOrders(filters) {
98    // Query the read-optimized store (Elasticsearch)
99    return await this.readDB.search({
100      index: 'orders',
101      body: {
102        query: {
103          bool: {
104            must: [
105              { match: { user_id: filters.userId } },
106              { range: { created_at: { gte: filters.startDate } } },
107            ],
108          },
109        },
110        sort: [{ created_at: 'desc' }],
111      },
112    });
113  }
114
115  async publishEvent(type, data) {
116    // In production: Kafka, RabbitMQ, or AWS SNS
117    console.log(`Published event: \${type}`, data);
118  }
119}
120
121// ---------- Pattern 3: Event Sourcing ----------
122
123class EventSourcedAccount {
124  constructor(accountId) {
125    this.accountId = accountId;
126    this.events = []; // Event log
127    this.balance = 0; // Current state (derived from events)
128  }
129
130  // Instead of updating state, we APPEND events
131  deposit(amount) {
132    const event = {
133      type: 'DEPOSITED',
134      accountId: this.accountId,
135      amount,
136      timestamp: Date.now(),
137    };
138    this.events.push(event);
139    this.applyEvent(event);
140    return event;
141  }
142
143  withdraw(amount) {
144    if (this.balance < amount) {
145      throw new Error('Insufficient funds');
146    }
147    const event = {
148      type: 'WITHDRAWN',
149      accountId: this.accountId,
150      amount,
151      timestamp: Date.now(),
152    };
153    this.events.push(event);
154    this.applyEvent(event);
155    return event;
156  }
157
158  // Apply a single event to update state
159  applyEvent(event) {
160    switch (event.type) {
161      case 'DEPOSITED':
162        this.balance += event.amount;
163        break;
164      case 'WITHDRAWN':
165        this.balance -= event.amount;
166        break;
167    }
168  }
169
170  // Rebuild state from event history
171  static fromEvents(accountId, events) {
172    const account = new EventSourcedAccount(accountId);
173    for (const event of events) {
174      account.applyEvent(event);
175      account.events.push(event);
176    }
177    return account;
178  }
179
180  // Get state at any point in time!
181  getBalanceAt(timestamp) {
182    let balance = 0;
183    for (const event of this.events) {
184      if (event.timestamp > timestamp) break;
185      if (event.type === 'DEPOSITED') balance += event.amount;
186      if (event.type === 'WITHDRAWN') balance -= event.amount;
187    }
188    return balance;
189  }
190}
191
192// Demo
193const account = new EventSourcedAccount('acc_123');
194account.deposit(1000);
195account.withdraw(250);
196account.deposit(500);
197console.log('Current balance:', account.balance); // 1250
198console.log('Event history:', account.events.length); // 3 events
199console.log('Full audit trail:', account.events);

🏋️ Practice Exercise

Normalize vs Denormalize: Given an e-commerce database, design both a normalized (3NF) and denormalized schema for the product listing page. Compare query complexity and performance for "Get product with category, reviews summary, and seller info."
CQRS Implementation: Design a CQRS architecture for a social media platform. Define the write model (PostgreSQL), read model (Elasticsearch), and the event pipeline that keeps them in sync.
Event Sourcing Design: Design an event-sourced shopping cart. Define all event types (ItemAdded, ItemRemoved, QuantityChanged, CartCleared, CartCheckedOut). Implement the state reconstruction logic.
Schema Evolution: Your users table needs a new preferences field with complex nested data (notification settings, theme, language). Compare: (a) adding a JSONB column, (b) creating a separate preferences table, (c) using a key-value pattern. Which would you choose?
Time-Series Model: Design the data model for a real-time analytics dashboard tracking page views, clicks, and conversions. Include: raw events table, pre-aggregated summary tables, and the aggregation pipeline.

⚠️ Common Mistakes

Over-normalizing in a read-heavy system — forcing JOINs on every read when data could be safely denormalized. If data rarely changes but is read constantly, denormalize it.
Denormalizing without a sync strategy — if you duplicate data (user name in posts table), you MUST have a mechanism to update all copies when the source changes. Without it, data becomes permanently inconsistent.
Using event sourcing for simple CRUD — event sourcing adds significant complexity (event replay, snapshotting, eventual consistency). Use it only when you need an audit trail, time-travel queries, or complex domain logic.
Ignoring the write amplification of denormalization — if a celebrity updates their profile picture and has 10M posts, you now have a massive batch update to run. Design around this with lazy updates or background jobs.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Data Modeling Patterns

Was this topic helpful?

← PreviousDatabase Replication Next →Object Storage & Blob Storage