Real-World DSA: Search Engines & Recommendation Systems

0/7 in this phase0/57 across the roadmap

📖 Concept

Search engines and recommendation systems are among the most popular system design interview topics. They rely heavily on data structures like inverted indices, tries, graphs, and hash maps.

Search Engine Core:

1. Inverted Index — The backbone of search:

Maps each word → list of document IDs containing it
"apple" → [doc1, doc5, doc42]
"pie" → [doc5, doc33, doc42]
Search "apple pie" → intersection of [doc1,doc5,doc42] ∩ [doc5,doc33,doc42] = [doc5, doc42]

2. TF-IDF (Term Frequency - Inverse Document Frequency):

TF = word frequency in document / total words in document
IDF = log(total documents / documents containing word)
TF-IDF = TF × IDF — higher score = more relevant

3. Autocomplete — Trie-based:

Trie stores all searchable terms
Each node may store a frequency/weight
Prefix search returns top-k suggestions by weight

Recommendation System Approaches:

Collaborative Filtering: Users who liked X also liked Y (graph-based)
Content-Based: Recommend similar items (feature vectors, cosine similarity)
Hybrid: Combine both approaches

Data structures used:

Inverted index: Hash Map of word → sorted document list
Autocomplete: Trie with frequency weights
Similarity: Min-heap for top-K most similar
User-item mapping: Sparse matrix / hash map

🏠 Real-world analogy: Google's index is like a book's index at the back — you look up a word and find which pages mention it. Autocomplete is like a friend who finishes your sentences based on what's popular.

💻 Code Example

codeTap to expand ⛶

1// INVERTED INDEX — Search engine core
2class InvertedIndex {
3  constructor() { this.index = new Map(); }
4
5  addDocument(docId, text) {
6    const words = text.toLowerCase().split(/\W+/);
7    for (const word of words) {
8      if (!this.index.has(word)) this.index.set(word, new Set());
9      this.index.get(word).add(docId);
10    }
11  }
12
13  search(query) {
14    const words = query.toLowerCase().split(/\W+/);
15    let result = null;
16    for (const word of words) {
17      const docs = this.index.get(word) || new Set();
18      result = result ? new Set([...result].filter(d => docs.has(d))) : new Set(docs);
19    }
20    return [...(result || [])];
21  }
22}
23
24// TF-IDF RANKING
25class TFIDFSearch {
26  constructor() { this.docs = new Map(); this.wordDocCount = new Map(); }
27
28  addDocument(docId, text) {
29    const words = text.toLowerCase().split(/\W+/);
30    const freq = new Map();
31    for (const w of words) freq.set(w, (freq.get(w) || 0) + 1);
32    this.docs.set(docId, { words, freq, totalWords: words.length });
33    for (const w of new Set(words)) {
34      this.wordDocCount.set(w, (this.wordDocCount.get(w) || 0) + 1);
35    }
36  }
37
38  search(query) {
39    const queryWords = query.toLowerCase().split(/\W+/);
40    const scores = [];
41    for (const [docId, doc] of this.docs) {
42      let score = 0;
43      for (const word of queryWords) {
44        const tf = (doc.freq.get(word) || 0) / doc.totalWords;
45        const idf = Math.log((this.docs.size + 1) / (1 + (this.wordDocCount.get(word) || 0)));
46        score += tf * idf;
47      }
48      if (score > 0) scores.push({ docId, score });
49    }
50    return scores.sort((a, b) => b.score - a.score);
51  }
52}
53
54// AUTOCOMPLETE — Trie with frequency
55class AutocompleteTrie {
56  constructor() { this.root = {}; }
57
58  insert(word, weight = 1) {
59    let node = this.root;
60    for (const c of word) { node[c] = node[c] || {}; node = node[c]; }
61    node.weight = (node.weight || 0) + weight;
62    node.isEnd = true;
63  }
64
65  suggest(prefix, limit = 5) {
66    let node = this.root;
67    for (const c of prefix) { if (!node[c]) return []; node = node[c]; }
68    const results = [];
69    this._dfs(node, prefix, results);
70    return results.sort((a, b) => b.weight - a.weight).slice(0, limit).map(r => r.word);
71  }
72
73  _dfs(node, path, results) {
74    if (node.isEnd) results.push({ word: path, weight: node.weight });
75    for (const [char, child] of Object.entries(node)) {
76      if (char !== 'isEnd' && char !== 'weight') this._dfs(child, path + char, results);
77    }
78  }
79}
80
81// COSINE SIMILARITY — For content-based recommendations
82function cosineSimilarity(vecA, vecB) {
83  let dot = 0, magA = 0, magB = 0;
84  for (let i = 0; i < vecA.length; i++) {
85    dot += vecA[i] * vecB[i];
86    magA += vecA[i] ** 2;
87    magB += vecB[i] ** 2;
88  }
89  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
90}

🏋️ Practice Exercise

Practice Problems & Design Questions:

Build a simple inverted index from a collection of documents
Implement a basic search engine with TF-IDF ranking
Design an autocomplete system with frequency-based suggestions
Implement cosine similarity for document comparison
Design a movie recommendation system using collaborative filtering
Build a "Did you mean?" spell checker (edit distance + trie)
Design a real-time trending search feature
How would you scale an inverted index to billions of documents?
Implement a simple PageRank algorithm
Design a product search system for an e-commerce site

⚠️ Common Mistakes

Not tokenizing/normalizing text — 'Apple', 'apple', 'APPLE' should map to same token
Not considering relevance ranking — returning results without TF-IDF or similar scoring
Autocomplete without frequency weighting — popular queries should rank higher
Not handling stop words — 'the', 'a', 'is' add noise to search results
Ignoring scalability — inverted index must be sharded for large-scale systems

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Real-World DSA: Search Engines & Recommendation Systems

Was this topic helpful?

← PreviousReal-World DSA: Rate Limiting & Scheduling Next →Interview Pattern Mastery — The Complete Guide