Bloom Filters & Probabilistic Data Structures

0/12 in this phase0/57 across the roadmap

📖 Concept

A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is possibly in a set or definitely not in a set.

Properties:

No false negatives — if it says "not present", it's 100% certain
Possible false positives — if it says "present", it MIGHT be wrong
Cannot delete elements (standard version)
Extremely space-efficient — uses a fraction of the memory vs hash set

How it works:

A bit array of size m, initialized to all 0s
k independent hash functions
Insert: compute k hashes, set those k bit positions to 1
Query: compute k hashes, check if ALL k positions are 1
- All 1 → "probably yes" (may be false positive)
- Any 0 → "definitely no"

Trade-offs:

More bits (m) → lower false positive rate
More hash functions (k) → sweet spot exists (too many = too many bits set)
Optimal k = (m/n) × ln(2) where n = expected elements

Real-world uses:

Chrome: checks URLs against malware database (millions of URLs, tiny memory)
Medium: recommends articles you haven't read
Cassandra/HBase: avoids disk reads for non-existent rows
CDN caching: quickly check if content is cached
Spell checkers: quick "might be misspelled" check

🏠 Real-world analogy: A bouncer with a fuzzy memory. If they say "you're NOT on the list," you're definitely not on the list. If they say "you might be on the list," you could be, but they might be confused with someone else.

💻 Code Example

codeTap to expand ⛶

1// BLOOM FILTER — Implementation
2class BloomFilter {
3  constructor(size = 1000, numHashes = 3) {
4    this.size = size;
5    this.numHashes = numHashes;
6    this.bits = new Array(size).fill(false);
7  }
8
9  // Simple hash functions using different seeds
10  _hash(value, seed) {
11    let hash = 0;
12    const str = String(value);
13    for (let i = 0; i < str.length; i++) {
14      hash = (hash * seed + str.charCodeAt(i)) % this.size;
15    }
16    return Math.abs(hash);
17  }
18
19  _getPositions(value) {
20    const positions = [];
21    const seeds = [31, 37, 41, 43, 47, 53, 59];
22    for (let i = 0; i < this.numHashes; i++) {
23      positions.push(this._hash(value, seeds[i]));
24    }
25    return positions;
26  }
27
28  // Add element
29  add(value) {
30    for (const pos of this._getPositions(value)) {
31      this.bits[pos] = true;
32    }
33  }
34
35  // Check if element might exist
36  mightContain(value) {
37    return this._getPositions(value).every(pos => this.bits[pos]);
38  }
39
40  // False positive rate ≈ (1 - e^(-kn/m))^k
41  estimateFPRate(numElements) {
42    const k = this.numHashes;
43    const m = this.size;
44    return Math.pow(1 - Math.exp(-k * numElements / m), k);
45  }
46}
47
48// USAGE
49const bloom = new BloomFilter(10000, 5);
50bloom.add("apple");
51bloom.add("banana");
52bloom.add("cherry");
53
54console.log(bloom.mightContain("apple"));   // true (definitely added)
55console.log(bloom.mightContain("banana"));  // true
56console.log(bloom.mightContain("grape"));   // false (definitely NOT added)
57console.log(bloom.mightContain("mango"));   // false OR true (false positive possible)
58
59// COUNTING BLOOM FILTER — Supports deletion
60class CountingBloomFilter {
61  constructor(size = 1000, numHashes = 3) {
62    this.size = size;
63    this.numHashes = numHashes;
64    this.counters = new Array(size).fill(0); // Counters instead of bits
65  }
66
67  _getPositions(value) {
68    const positions = [];
69    const seeds = [31, 37, 41];
70    for (let i = 0; i < this.numHashes; i++) {
71      let hash = 0;
72      const str = String(value);
73      for (let j = 0; j < str.length; j++)
74        hash = (hash * seeds[i] + str.charCodeAt(j)) % this.size;
75      positions.push(Math.abs(hash));
76    }
77    return positions;
78  }
79
80  add(value) { for (const p of this._getPositions(value)) this.counters[p]++; }
81  remove(value) { for (const p of this._getPositions(value)) this.counters[p]--; }
82  mightContain(value) { return this._getPositions(value).every(p => this.counters[p] > 0); }
83}

🏋️ Practice Exercise

Practice Problems:

Implement a basic Bloom filter with add and mightContain
Calculate the optimal number of hash functions for given m and n
Implement a counting Bloom filter that supports deletion
Estimate the false positive rate for a given configuration
Design a URL shortener that uses a Bloom filter to avoid collisions
Use a Bloom filter to find elements in one set but not another
Compare memory usage: Bloom filter vs HashSet for 1 million URLs
Implement HyperLogLog for approximate distinct count (conceptual)
Design a cache system that uses a Bloom filter to avoid cache misses
Explain when a Bloom filter is NOT appropriate

⚠️ Common Mistakes

Assuming Bloom filters have no errors — they DO have false positives (just no false negatives)
Using too few hash functions — increases false positive rate
Using too many hash functions — fills the bit array too quickly, also increases false positives
Trying to delete from a standard Bloom filter — use counting Bloom filter for deletion
Using Bloom filters when exact results are needed — they're probabilistic by design

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Bloom Filters & Probabilistic Data Structures

Was this topic helpful?

← PreviousBalanced BSTs (AVL, Red-Black — Conceptual)Next →Skip Lists — Probabilistic Balanced Structure