Real-World DSA: Search Engines & Recommendation Systems

0/7 in this phase0/57 across the roadmap

📖 Concept

Search engines and recommendation systems are among the most popular system design interview topics. They rely heavily on data structures like inverted indices, tries, graphs, and hash maps.

Search Engine Core:

1. Inverted Index — The backbone of search:

  • Maps each word → list of document IDs containing it
  • "apple" → [doc1, doc5, doc42]
  • "pie" → [doc5, doc33, doc42]
  • Search "apple pie" → intersection of [doc1,doc5,doc42] ∩ [doc5,doc33,doc42] = [doc5, doc42]

2. TF-IDF (Term Frequency - Inverse Document Frequency):

  • TF = word frequency in document / total words in document
  • IDF = log(total documents / documents containing word)
  • TF-IDF = TF × IDF — higher score = more relevant

3. Autocomplete — Trie-based:

  • Trie stores all searchable terms
  • Each node may store a frequency/weight
  • Prefix search returns top-k suggestions by weight

Recommendation System Approaches:

  1. Collaborative Filtering: Users who liked X also liked Y (graph-based)
  2. Content-Based: Recommend similar items (feature vectors, cosine similarity)
  3. Hybrid: Combine both approaches

Data structures used:

  • Inverted index: Hash Map of word → sorted document list
  • Autocomplete: Trie with frequency weights
  • Similarity: Min-heap for top-K most similar
  • User-item mapping: Sparse matrix / hash map

🏠 Real-world analogy: Google's index is like a book's index at the back — you look up a word and find which pages mention it. Autocomplete is like a friend who finishes your sentences based on what's popular.

💻 Code Example

codeTap to expand ⛶
1// INVERTED INDEX — Search engine core
2class InvertedIndex {
3 constructor() { this.index = new Map(); }
4
5 addDocument(docId, text) {
6 const words = text.toLowerCase().split(/\W+/);
7 for (const word of words) {
8 if (!this.index.has(word)) this.index.set(word, new Set());
9 this.index.get(word).add(docId);
10 }
11 }
12
13 search(query) {
14 const words = query.toLowerCase().split(/\W+/);
15 let result = null;
16 for (const word of words) {
17 const docs = this.index.get(word) || new Set();
18 result = result ? new Set([...result].filter(d => docs.has(d))) : new Set(docs);
19 }
20 return [...(result || [])];
21 }
22}
23
24// TF-IDF RANKING
25class TFIDFSearch {
26 constructor() { this.docs = new Map(); this.wordDocCount = new Map(); }
27
28 addDocument(docId, text) {
29 const words = text.toLowerCase().split(/\W+/);
30 const freq = new Map();
31 for (const w of words) freq.set(w, (freq.get(w) || 0) + 1);
32 this.docs.set(docId, { words, freq, totalWords: words.length });
33 for (const w of new Set(words)) {
34 this.wordDocCount.set(w, (this.wordDocCount.get(w) || 0) + 1);
35 }
36 }
37
38 search(query) {
39 const queryWords = query.toLowerCase().split(/\W+/);
40 const scores = [];
41 for (const [docId, doc] of this.docs) {
42 let score = 0;
43 for (const word of queryWords) {
44 const tf = (doc.freq.get(word) || 0) / doc.totalWords;
45 const idf = Math.log((this.docs.size + 1) / (1 + (this.wordDocCount.get(word) || 0)));
46 score += tf * idf;
47 }
48 if (score > 0) scores.push({ docId, score });
49 }
50 return scores.sort((a, b) => b.score - a.score);
51 }
52}
53
54// AUTOCOMPLETE — Trie with frequency
55class AutocompleteTrie {
56 constructor() { this.root = {}; }
57
58 insert(word, weight = 1) {
59 let node = this.root;
60 for (const c of word) { node[c] = node[c] || {}; node = node[c]; }
61 node.weight = (node.weight || 0) + weight;
62 node.isEnd = true;
63 }
64
65 suggest(prefix, limit = 5) {
66 let node = this.root;
67 for (const c of prefix) { if (!node[c]) return []; node = node[c]; }
68 const results = [];
69 this._dfs(node, prefix, results);
70 return results.sort((a, b) => b.weight - a.weight).slice(0, limit).map(r => r.word);
71 }
72
73 _dfs(node, path, results) {
74 if (node.isEnd) results.push({ word: path, weight: node.weight });
75 for (const [char, child] of Object.entries(node)) {
76 if (char !== 'isEnd' && char !== 'weight') this._dfs(child, path + char, results);
77 }
78 }
79}
80
81// COSINE SIMILARITY — For content-based recommendations
82function cosineSimilarity(vecA, vecB) {
83 let dot = 0, magA = 0, magB = 0;
84 for (let i = 0; i < vecA.length; i++) {
85 dot += vecA[i] * vecB[i];
86 magA += vecA[i] ** 2;
87 magB += vecB[i] ** 2;
88 }
89 return dot / (Math.sqrt(magA) * Math.sqrt(magB));
90}

🏋️ Practice Exercise

Practice Problems & Design Questions:

  1. Build a simple inverted index from a collection of documents
  2. Implement a basic search engine with TF-IDF ranking
  3. Design an autocomplete system with frequency-based suggestions
  4. Implement cosine similarity for document comparison
  5. Design a movie recommendation system using collaborative filtering
  6. Build a "Did you mean?" spell checker (edit distance + trie)
  7. Design a real-time trending search feature
  8. How would you scale an inverted index to billions of documents?
  9. Implement a simple PageRank algorithm
  10. Design a product search system for an e-commerce site

⚠️ Common Mistakes

  • Not tokenizing/normalizing text — 'Apple', 'apple', 'APPLE' should map to same token

  • Not considering relevance ranking — returning results without TF-IDF or similar scoring

  • Autocomplete without frequency weighting — popular queries should rank higher

  • Not handling stop words — 'the', 'a', 'is' add noise to search results

  • Ignoring scalability — inverted index must be sharded for large-scale systems

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Real-World DSA: Search Engines & Recommendation Systems