Real-World DSA: Search Engines & Recommendation Systems
📖 Concept
Search engines and recommendation systems are among the most popular system design interview topics. They rely heavily on data structures like inverted indices, tries, graphs, and hash maps.
Search Engine Core:
1. Inverted Index — The backbone of search:
- Maps each word → list of document IDs containing it
- "apple" → [doc1, doc5, doc42]
- "pie" → [doc5, doc33, doc42]
- Search "apple pie" → intersection of [doc1,doc5,doc42] ∩ [doc5,doc33,doc42] = [doc5, doc42]
2. TF-IDF (Term Frequency - Inverse Document Frequency):
- TF = word frequency in document / total words in document
- IDF = log(total documents / documents containing word)
- TF-IDF = TF × IDF — higher score = more relevant
3. Autocomplete — Trie-based:
- Trie stores all searchable terms
- Each node may store a frequency/weight
- Prefix search returns top-k suggestions by weight
Recommendation System Approaches:
- Collaborative Filtering: Users who liked X also liked Y (graph-based)
- Content-Based: Recommend similar items (feature vectors, cosine similarity)
- Hybrid: Combine both approaches
Data structures used:
- Inverted index: Hash Map of word → sorted document list
- Autocomplete: Trie with frequency weights
- Similarity: Min-heap for top-K most similar
- User-item mapping: Sparse matrix / hash map
🏠 Real-world analogy: Google's index is like a book's index at the back — you look up a word and find which pages mention it. Autocomplete is like a friend who finishes your sentences based on what's popular.
💻 Code Example
1// INVERTED INDEX — Search engine core2class InvertedIndex {3 constructor() { this.index = new Map(); }45 addDocument(docId, text) {6 const words = text.toLowerCase().split(/\W+/);7 for (const word of words) {8 if (!this.index.has(word)) this.index.set(word, new Set());9 this.index.get(word).add(docId);10 }11 }1213 search(query) {14 const words = query.toLowerCase().split(/\W+/);15 let result = null;16 for (const word of words) {17 const docs = this.index.get(word) || new Set();18 result = result ? new Set([...result].filter(d => docs.has(d))) : new Set(docs);19 }20 return [...(result || [])];21 }22}2324// TF-IDF RANKING25class TFIDFSearch {26 constructor() { this.docs = new Map(); this.wordDocCount = new Map(); }2728 addDocument(docId, text) {29 const words = text.toLowerCase().split(/\W+/);30 const freq = new Map();31 for (const w of words) freq.set(w, (freq.get(w) || 0) + 1);32 this.docs.set(docId, { words, freq, totalWords: words.length });33 for (const w of new Set(words)) {34 this.wordDocCount.set(w, (this.wordDocCount.get(w) || 0) + 1);35 }36 }3738 search(query) {39 const queryWords = query.toLowerCase().split(/\W+/);40 const scores = [];41 for (const [docId, doc] of this.docs) {42 let score = 0;43 for (const word of queryWords) {44 const tf = (doc.freq.get(word) || 0) / doc.totalWords;45 const idf = Math.log((this.docs.size + 1) / (1 + (this.wordDocCount.get(word) || 0)));46 score += tf * idf;47 }48 if (score > 0) scores.push({ docId, score });49 }50 return scores.sort((a, b) => b.score - a.score);51 }52}5354// AUTOCOMPLETE — Trie with frequency55class AutocompleteTrie {56 constructor() { this.root = {}; }5758 insert(word, weight = 1) {59 let node = this.root;60 for (const c of word) { node[c] = node[c] || {}; node = node[c]; }61 node.weight = (node.weight || 0) + weight;62 node.isEnd = true;63 }6465 suggest(prefix, limit = 5) {66 let node = this.root;67 for (const c of prefix) { if (!node[c]) return []; node = node[c]; }68 const results = [];69 this._dfs(node, prefix, results);70 return results.sort((a, b) => b.weight - a.weight).slice(0, limit).map(r => r.word);71 }7273 _dfs(node, path, results) {74 if (node.isEnd) results.push({ word: path, weight: node.weight });75 for (const [char, child] of Object.entries(node)) {76 if (char !== 'isEnd' && char !== 'weight') this._dfs(child, path + char, results);77 }78 }79}8081// COSINE SIMILARITY — For content-based recommendations82function cosineSimilarity(vecA, vecB) {83 let dot = 0, magA = 0, magB = 0;84 for (let i = 0; i < vecA.length; i++) {85 dot += vecA[i] * vecB[i];86 magA += vecA[i] ** 2;87 magB += vecB[i] ** 2;88 }89 return dot / (Math.sqrt(magA) * Math.sqrt(magB));90}
🏋️ Practice Exercise
Practice Problems & Design Questions:
- Build a simple inverted index from a collection of documents
- Implement a basic search engine with TF-IDF ranking
- Design an autocomplete system with frequency-based suggestions
- Implement cosine similarity for document comparison
- Design a movie recommendation system using collaborative filtering
- Build a "Did you mean?" spell checker (edit distance + trie)
- Design a real-time trending search feature
- How would you scale an inverted index to billions of documents?
- Implement a simple PageRank algorithm
- Design a product search system for an e-commerce site
⚠️ Common Mistakes
Not tokenizing/normalizing text — 'Apple', 'apple', 'APPLE' should map to same token
Not considering relevance ranking — returning results without TF-IDF or similar scoring
Autocomplete without frequency weighting — popular queries should rank higher
Not handling stop words — 'the', 'a', 'is' add noise to search results
Ignoring scalability — inverted index must be sharded for large-scale systems
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Real-World DSA: Search Engines & Recommendation Systems