Capstone Project & Debugging Exercises

0/7 in this phase0/57 across the roadmap

📖 Concept

The capstone ties everything together. Implement a real-world system that uses MULTIPLE data structures and algorithms working together.

CAPSTONE PROJECT: Build a Mini Search Engine

Requirements:

Build an inverted index from a set of documents (text files or strings)
Support multi-word search with AND/OR operations
Rank results by TF-IDF relevance score
Provide autocomplete suggestions as user types
Cache recent searches with LRU cache
Handle edge cases: empty queries, special characters, case insensitivity

Data structures used:

Hash Map: inverted index (word → document list)
Trie: autocomplete suggestions
LRU Cache: recent search results
Min-Heap: top-K results by relevance
Array: document storage
Set: deduplication

Debugging Exercises (test your understanding): These exercises contain INTENTIONAL bugs. Find and fix them.

Off-by-one binary search — infinite loop on certain inputs
Stack overflow in tree traversal — missing base case
LRU cache memory leak — not deleting from linked list
Incorrect topological sort — wrong cycle detection
Hash collision causing wrong answers — using object as key

🏠 Real-world analogy: The capstone is like a final exam where you use EVERYTHING you've learned — not one concept in isolation, but combining data structures and algorithms to build a working system.

💻 Code Example

codeTap to expand ⛶

1// CAPSTONE: MINI SEARCH ENGINE
2class MiniSearchEngine {
3  constructor() {
4    this.documents = new Map();          // docId → text
5    this.invertedIndex = new Map();       // word → Set<docId>
6    this.autocompleteTrie = { children: {} };
7    this.searchCache = new LRUSearchCache(100);
8    this.docCount = 0;
9  }
10
11  // Add a document to the engine
12  addDocument(text) {
13    const docId = this.docCount++;
14    this.documents.set(docId, text);
15
16    const words = this.tokenize(text);
17    const wordSet = new Set(words);
18
19    for (const word of wordSet) {
20      if (!this.invertedIndex.has(word)) this.invertedIndex.set(word, new Set());
21      this.invertedIndex.get(word).add(docId);
22      this.insertTrie(word);
23    }
24  }
25
26  // Search for documents matching query
27  search(query) {
28    // Check cache first
29    const cached = this.searchCache.get(query);
30    if (cached) return cached;
31
32    const words = this.tokenize(query);
33    if (words.length === 0) return [];
34
35    // AND search — intersect document sets
36    let resultDocs = null;
37    for (const word of words) {
38      const docs = this.invertedIndex.get(word) || new Set();
39      resultDocs = resultDocs
40        ? new Set([...resultDocs].filter(d => docs.has(d)))
41        : new Set(docs);
42    }
43
44    // Rank by TF-IDF
45    const ranked = this.rankResults([...resultDocs], words);
46
47    // Cache result
48    this.searchCache.put(query, ranked);
49    return ranked;
50  }
51
52  // Autocomplete suggestions
53  suggest(prefix) {
54    let node = this.autocompleteTrie;
55    for (const c of prefix.toLowerCase()) {
56      if (!node.children[c]) return [];
57      node = node.children[c];
58    }
59    const results = [];
60    this.dfs(node, prefix.toLowerCase(), results);
61    return results.slice(0, 5);
62  }
63
64  // Internal helpers
65  tokenize(text) {
66    return text.toLowerCase().replace(/[^a-z0-9 ]/g, '').split(/\s+/).filter(w => w.length > 0);
67  }
68
69  insertTrie(word) {
70    let node = this.autocompleteTrie;
71    for (const c of word) {
72      if (!node.children[c]) node.children[c] = { children: {}, count: 0 };
73      node = node.children[c];
74    }
75    node.isEnd = true;
76    node.count = (node.count || 0) + 1;
77  }
78
79  dfs(node, path, results) {
80    if (node.isEnd) results.push(path);
81    for (const [c, child] of Object.entries(node.children)) {
82      if (c !== 'isEnd' && c !== 'count') this.dfs(child, path + c, results);
83    }
84  }
85
86  rankResults(docIds, queryWords) {
87    return docIds.map(docId => ({
88      docId,
89      score: this.computeTFIDF(docId, queryWords),
90      preview: this.documents.get(docId).substring(0, 100)
91    })).sort((a, b) => b.score - a.score);
92  }
93
94  computeTFIDF(docId, queryWords) {
95    const text = this.documents.get(docId);
96    const words = this.tokenize(text);
97    let score = 0;
98    for (const qw of queryWords) {
99      const tf = words.filter(w => w === qw).length / words.length;
100      const df = (this.invertedIndex.get(qw)?.size || 0);
101      const idf = Math.log((this.docCount + 1) / (df + 1));
102      score += tf * idf;
103    }
104    return score;
105  }
106}
107
108class LRUSearchCache {
109  constructor(cap) { this.cap = cap; this.map = new Map(); }
110  get(key) { if (!this.map.has(key)) return null; const v = this.map.get(key); this.map.delete(key); this.map.set(key, v); return v; }
111  put(key, val) { if (this.map.has(key)) this.map.delete(key); this.map.set(key, val); if (this.map.size > this.cap) this.map.delete(this.map.keys().next().value); }
112}
113
114// DEBUGGING EXERCISE #1: Find the bug
115function binarySearchBuggy(arr, target) {
116  let lo = 0, hi = arr.length; // BUG: should be arr.length - 1
117  while (lo <= hi) {
118    const mid = Math.floor((lo + hi) / 2);
119    if (arr[mid] === target) return mid;
120    if (arr[mid] < target) lo = mid + 1;
121    else hi = mid; // BUG: should be mid - 1 (causes infinite loop)
122  }
123  return -1;
124}

🏋️ Practice Exercise

Capstone Project Steps:

Implement the MiniSearchEngine class
Add 10+ documents (paragraphs of text)
Test search with single and multi-word queries
Test autocomplete with various prefixes
Verify LRU cache is working (search twice, second should be cached)
Add support for OR search (union of document sets)
Add stop word filtering (skip 'the', 'a', 'is', etc.)
Measure performance: how fast is search with 1000 documents?
Add phrase search: "exact phrase" finds documents with exact sequence
Write unit tests for each component

Debugging Challenges:

Fix the binary search bug above — what are the two issues?
This DFS has a bug: if (!root) return; visit(root); dfs(root.left); — what's missing?
This LRU cache doesn't evict properly — why?
This topological sort gives wrong order — the graph has a cycle but returns a result
This hash map uses objects as keys — why does lookup fail?

⚠️ Common Mistakes

Building the capstone in one giant function — modularize into separate classes
Not testing each component individually before integration
Skipping the debugging exercises — finding bugs is a crucial interview skill
Not measuring performance — understanding real-world speed is important
Over-engineering the capstone — start simple, add features incrementally

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Capstone Project & Debugging Exercises

Was this topic helpful?

← Previous20 System-Thinking DSA Problems