Capstone Project & Debugging Exercises

0/7 in this phase0/57 across the roadmap

📖 Concept

The capstone ties everything together. Implement a real-world system that uses MULTIPLE data structures and algorithms working together.

CAPSTONE PROJECT: Build a Mini Search Engine

Requirements:

  1. Build an inverted index from a set of documents (text files or strings)
  2. Support multi-word search with AND/OR operations
  3. Rank results by TF-IDF relevance score
  4. Provide autocomplete suggestions as user types
  5. Cache recent searches with LRU cache
  6. Handle edge cases: empty queries, special characters, case insensitivity

Data structures used:

  • Hash Map: inverted index (word → document list)
  • Trie: autocomplete suggestions
  • LRU Cache: recent search results
  • Min-Heap: top-K results by relevance
  • Array: document storage
  • Set: deduplication

Debugging Exercises (test your understanding): These exercises contain INTENTIONAL bugs. Find and fix them.

  1. Off-by-one binary search — infinite loop on certain inputs
  2. Stack overflow in tree traversal — missing base case
  3. LRU cache memory leak — not deleting from linked list
  4. Incorrect topological sort — wrong cycle detection
  5. Hash collision causing wrong answers — using object as key

🏠 Real-world analogy: The capstone is like a final exam where you use EVERYTHING you've learned — not one concept in isolation, but combining data structures and algorithms to build a working system.

💻 Code Example

codeTap to expand ⛶
1// CAPSTONE: MINI SEARCH ENGINE
2class MiniSearchEngine {
3 constructor() {
4 this.documents = new Map(); // docId → text
5 this.invertedIndex = new Map(); // word → Set<docId>
6 this.autocompleteTrie = { children: {} };
7 this.searchCache = new LRUSearchCache(100);
8 this.docCount = 0;
9 }
10
11 // Add a document to the engine
12 addDocument(text) {
13 const docId = this.docCount++;
14 this.documents.set(docId, text);
15
16 const words = this.tokenize(text);
17 const wordSet = new Set(words);
18
19 for (const word of wordSet) {
20 if (!this.invertedIndex.has(word)) this.invertedIndex.set(word, new Set());
21 this.invertedIndex.get(word).add(docId);
22 this.insertTrie(word);
23 }
24 }
25
26 // Search for documents matching query
27 search(query) {
28 // Check cache first
29 const cached = this.searchCache.get(query);
30 if (cached) return cached;
31
32 const words = this.tokenize(query);
33 if (words.length === 0) return [];
34
35 // AND search — intersect document sets
36 let resultDocs = null;
37 for (const word of words) {
38 const docs = this.invertedIndex.get(word) || new Set();
39 resultDocs = resultDocs
40 ? new Set([...resultDocs].filter(d => docs.has(d)))
41 : new Set(docs);
42 }
43
44 // Rank by TF-IDF
45 const ranked = this.rankResults([...resultDocs], words);
46
47 // Cache result
48 this.searchCache.put(query, ranked);
49 return ranked;
50 }
51
52 // Autocomplete suggestions
53 suggest(prefix) {
54 let node = this.autocompleteTrie;
55 for (const c of prefix.toLowerCase()) {
56 if (!node.children[c]) return [];
57 node = node.children[c];
58 }
59 const results = [];
60 this.dfs(node, prefix.toLowerCase(), results);
61 return results.slice(0, 5);
62 }
63
64 // Internal helpers
65 tokenize(text) {
66 return text.toLowerCase().replace(/[^a-z0-9 ]/g, '').split(/\s+/).filter(w => w.length > 0);
67 }
68
69 insertTrie(word) {
70 let node = this.autocompleteTrie;
71 for (const c of word) {
72 if (!node.children[c]) node.children[c] = { children: {}, count: 0 };
73 node = node.children[c];
74 }
75 node.isEnd = true;
76 node.count = (node.count || 0) + 1;
77 }
78
79 dfs(node, path, results) {
80 if (node.isEnd) results.push(path);
81 for (const [c, child] of Object.entries(node.children)) {
82 if (c !== 'isEnd' && c !== 'count') this.dfs(child, path + c, results);
83 }
84 }
85
86 rankResults(docIds, queryWords) {
87 return docIds.map(docId => ({
88 docId,
89 score: this.computeTFIDF(docId, queryWords),
90 preview: this.documents.get(docId).substring(0, 100)
91 })).sort((a, b) => b.score - a.score);
92 }
93
94 computeTFIDF(docId, queryWords) {
95 const text = this.documents.get(docId);
96 const words = this.tokenize(text);
97 let score = 0;
98 for (const qw of queryWords) {
99 const tf = words.filter(w => w === qw).length / words.length;
100 const df = (this.invertedIndex.get(qw)?.size || 0);
101 const idf = Math.log((this.docCount + 1) / (df + 1));
102 score += tf * idf;
103 }
104 return score;
105 }
106}
107
108class LRUSearchCache {
109 constructor(cap) { this.cap = cap; this.map = new Map(); }
110 get(key) { if (!this.map.has(key)) return null; const v = this.map.get(key); this.map.delete(key); this.map.set(key, v); return v; }
111 put(key, val) { if (this.map.has(key)) this.map.delete(key); this.map.set(key, val); if (this.map.size > this.cap) this.map.delete(this.map.keys().next().value); }
112}
113
114// DEBUGGING EXERCISE #1: Find the bug
115function binarySearchBuggy(arr, target) {
116 let lo = 0, hi = arr.length; // BUG: should be arr.length - 1
117 while (lo <= hi) {
118 const mid = Math.floor((lo + hi) / 2);
119 if (arr[mid] === target) return mid;
120 if (arr[mid] < target) lo = mid + 1;
121 else hi = mid; // BUG: should be mid - 1 (causes infinite loop)
122 }
123 return -1;
124}

🏋️ Practice Exercise

Capstone Project Steps:

  1. Implement the MiniSearchEngine class
  2. Add 10+ documents (paragraphs of text)
  3. Test search with single and multi-word queries
  4. Test autocomplete with various prefixes
  5. Verify LRU cache is working (search twice, second should be cached)
  6. Add support for OR search (union of document sets)
  7. Add stop word filtering (skip 'the', 'a', 'is', etc.)
  8. Measure performance: how fast is search with 1000 documents?
  9. Add phrase search: "exact phrase" finds documents with exact sequence
  10. Write unit tests for each component

Debugging Challenges:

  1. Fix the binary search bug above — what are the two issues?
  2. This DFS has a bug: if (!root) return; visit(root); dfs(root.left); — what's missing?
  3. This LRU cache doesn't evict properly — why?
  4. This topological sort gives wrong order — the graph has a cycle but returns a result
  5. This hash map uses objects as keys — why does lookup fail?

⚠️ Common Mistakes

  • Building the capstone in one giant function — modularize into separate classes

  • Not testing each component individually before integration

  • Skipping the debugging exercises — finding bugs is a crucial interview skill

  • Not measuring performance — understanding real-world speed is important

  • Over-engineering the capstone — start simple, add features incrementally

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Capstone Project & Debugging Exercises