Data Modeling Patterns
📖 Concept
Data modeling is the art of structuring your data to match your application's access patterns. A well-designed data model makes queries fast and simple; a poorly designed one leads to complex queries, slow performance, and scaling headaches.
Normalization vs Denormalization
Normalization (Eliminating Redundancy)
Goal: Each piece of data is stored in one place only. Related data is connected through foreign keys and JOINs.
Normal Forms (simplified):
- 1NF: No repeating groups, atomic values
- 2NF: 1NF + no partial dependencies
- 3NF: 2NF + no transitive dependencies
Denormalization (Strategic Redundancy)
Goal: Duplicate data to avoid expensive JOINs at read time.
| Pattern | Normalized | Denormalized |
|---|---|---|
| User's post count | SELECT COUNT(*) FROM posts WHERE user_id = ? |
users.post_count (pre-computed column) |
| Post with author | JOIN posts + users | Store author_name directly in posts table |
| Order total | SUM across order_items | Store total in orders table |
Common Data Modeling Patterns
1. Star Schema (Analytics)
Central fact table (events, transactions) surrounded by dimension tables (users, products, time). Optimized for OLAP (analytical queries).
2. Adjacency List (Hierarchies)
Each row has a parent_id pointing to its parent. Simple but recursive queries are slow.
3. Materialized Path (Hierarchies)
Store the full path: /root/parent/child. Fast reads but updating a node's position requires updating all descendants.
4. Event Sourcing
Instead of storing current state, store a log of all events. Current state is computed by replaying events. Used in financial systems and CQRS architectures.
5. CQRS (Command Query Responsibility Segregation)
Separate the write model (optimized for commands/updates) from the read model (optimized for queries). Different databases can serve each model.
Pro tip: In interviews, explain your data model choices — "I denormalized the post author name because our read:write ratio is 100:1 and this eliminates a JOIN on every feed query."
💻 Code Example
1// ============================================2// Data Modeling Patterns — Practical Examples3// ============================================45// ---------- Pattern 1: Denormalization for Performance ----------67// ❌ NORMALIZED: Post with author info (requires JOIN every read)8const normalizedSchema = `9 CREATE TABLE users (10 id SERIAL PRIMARY KEY,11 name VARCHAR(100),12 avatar VARCHAR(255)13 );1415 CREATE TABLE posts (16 id SERIAL PRIMARY KEY,17 user_id INTEGER REFERENCES users(id),18 content TEXT,19 created_at TIMESTAMP20 );2122 -- Every feed query needs a JOIN:23 SELECT p.*, u.name, u.avatar24 FROM posts p25 JOIN users u ON u.id = p.user_id26 ORDER BY p.created_at DESC LIMIT 20;27`;2829// ✅ DENORMALIZED: Author info embedded in posts table30const denormalizedSchema = `31 CREATE TABLE posts (32 id SERIAL PRIMARY KEY,33 user_id INTEGER REFERENCES users(id),34 author_name VARCHAR(100), -- Denormalized from users35 author_avatar VARCHAR(255), -- Denormalized from users36 content TEXT,37 like_count INTEGER DEFAULT 0, -- Pre-computed counter38 comment_count INTEGER DEFAULT 0,39 created_at TIMESTAMP40 );4142 -- Feed query: NO JOIN needed!43 SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;44`;4546// The cost: when a user changes their name/avatar, you must47// update ALL their posts. Acceptable if updates are rare48// and reads are frequent (social media: 100:1 read:write)4950// ---------- Pattern 2: CQRS (Separate Read/Write Models) ----------5152class CQRSExample {53 constructor() {54 this.writeDB = null; // PostgreSQL (normalized, ACID)55 this.readDB = null; // Elasticsearch or Redis (denormalized, fast)56 }5758 // WRITE path: Normalized, ACID, consistent59 async createOrder(orderData) {60 const order = await this.writeDB.transaction(async (tx) => {61 const order = await tx.insert('orders', {62 user_id: orderData.userId,63 status: 'pending',64 total: orderData.total,65 });6667 for (const item of orderData.items) {68 await tx.insert('order_items', {69 order_id: order.id,70 product_id: item.productId,71 quantity: item.quantity,72 price: item.price,73 });7475 // Update inventory76 await tx.query(77 'UPDATE products SET stock = stock - $1 WHERE id = $2 AND stock >= $1',78 [item.quantity, item.productId]79 );80 }8182 return order;83 });8485 // Publish event for read model to update itself86 await this.publishEvent('order.created', {87 orderId: order.id,88 userId: orderData.userId,89 items: orderData.items,90 total: orderData.total,91 });9293 return order;94 }9596 // READ path: Denormalized, fast, eventual consistency97 async searchOrders(filters) {98 // Query the read-optimized store (Elasticsearch)99 return await this.readDB.search({100 index: 'orders',101 body: {102 query: {103 bool: {104 must: [105 { match: { user_id: filters.userId } },106 { range: { created_at: { gte: filters.startDate } } },107 ],108 },109 },110 sort: [{ created_at: 'desc' }],111 },112 });113 }114115 async publishEvent(type, data) {116 // In production: Kafka, RabbitMQ, or AWS SNS117 console.log(`Published event: \${type}`, data);118 }119}120121// ---------- Pattern 3: Event Sourcing ----------122123class EventSourcedAccount {124 constructor(accountId) {125 this.accountId = accountId;126 this.events = []; // Event log127 this.balance = 0; // Current state (derived from events)128 }129130 // Instead of updating state, we APPEND events131 deposit(amount) {132 const event = {133 type: 'DEPOSITED',134 accountId: this.accountId,135 amount,136 timestamp: Date.now(),137 };138 this.events.push(event);139 this.applyEvent(event);140 return event;141 }142143 withdraw(amount) {144 if (this.balance < amount) {145 throw new Error('Insufficient funds');146 }147 const event = {148 type: 'WITHDRAWN',149 accountId: this.accountId,150 amount,151 timestamp: Date.now(),152 };153 this.events.push(event);154 this.applyEvent(event);155 return event;156 }157158 // Apply a single event to update state159 applyEvent(event) {160 switch (event.type) {161 case 'DEPOSITED':162 this.balance += event.amount;163 break;164 case 'WITHDRAWN':165 this.balance -= event.amount;166 break;167 }168 }169170 // Rebuild state from event history171 static fromEvents(accountId, events) {172 const account = new EventSourcedAccount(accountId);173 for (const event of events) {174 account.applyEvent(event);175 account.events.push(event);176 }177 return account;178 }179180 // Get state at any point in time!181 getBalanceAt(timestamp) {182 let balance = 0;183 for (const event of this.events) {184 if (event.timestamp > timestamp) break;185 if (event.type === 'DEPOSITED') balance += event.amount;186 if (event.type === 'WITHDRAWN') balance -= event.amount;187 }188 return balance;189 }190}191192// Demo193const account = new EventSourcedAccount('acc_123');194account.deposit(1000);195account.withdraw(250);196account.deposit(500);197console.log('Current balance:', account.balance); // 1250198console.log('Event history:', account.events.length); // 3 events199console.log('Full audit trail:', account.events);
🏋️ Practice Exercise
Normalize vs Denormalize: Given an e-commerce database, design both a normalized (3NF) and denormalized schema for the product listing page. Compare query complexity and performance for "Get product with category, reviews summary, and seller info."
CQRS Implementation: Design a CQRS architecture for a social media platform. Define the write model (PostgreSQL), read model (Elasticsearch), and the event pipeline that keeps them in sync.
Event Sourcing Design: Design an event-sourced shopping cart. Define all event types (ItemAdded, ItemRemoved, QuantityChanged, CartCleared, CartCheckedOut). Implement the state reconstruction logic.
Schema Evolution: Your users table needs a new
preferencesfield with complex nested data (notification settings, theme, language). Compare: (a) adding a JSONB column, (b) creating a separate preferences table, (c) using a key-value pattern. Which would you choose?Time-Series Model: Design the data model for a real-time analytics dashboard tracking page views, clicks, and conversions. Include: raw events table, pre-aggregated summary tables, and the aggregation pipeline.
⚠️ Common Mistakes
Over-normalizing in a read-heavy system — forcing JOINs on every read when data could be safely denormalized. If data rarely changes but is read constantly, denormalize it.
Denormalizing without a sync strategy — if you duplicate data (user name in posts table), you MUST have a mechanism to update all copies when the source changes. Without it, data becomes permanently inconsistent.
Using event sourcing for simple CRUD — event sourcing adds significant complexity (event replay, snapshotting, eventual consistency). Use it only when you need an audit trail, time-travel queries, or complex domain logic.
Ignoring the write amplification of denormalization — if a celebrity updates their profile picture and has 10M posts, you now have a massive batch update to run. Design around this with lazy updates or background jobs.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Data Modeling Patterns