collections Module

0/3 in this phase0/54 across the roadmap

📖 Concept

The collections module provides specialized container datatypes that extend Python's built-in dict, list, set, and tuple with additional functionality. These are not exotic data structures — they are production workhorses that solve common problems more efficiently and expressively than rolling your own.

namedtuple creates lightweight, immutable, tuple subclasses with named fields. They use the same memory as regular tuples (no per-instance __dict__), support indexing and unpacking, and are perfect for representing records, coordinates, database rows, or any fixed-structure data. For new code, typing.NamedTuple provides the same functionality with type annotations.

deque (double-ended queue) is a thread-safe, O(1) append/pop from both ends. Lists are O(n) for insert(0, x) and pop(0) because they shift all elements. Deque with maxlen automatically evicts the oldest element when full — perfect for sliding windows, bounded buffers, and recent-history tracking.

defaultdict eliminates the "check if key exists, initialize if not" pattern. You provide a factory function (int, list, set, or a custom callable) that creates default values for missing keys. Nested defaultdicts enable auto-vivifying tree structures.

OrderedDict was essential before Python 3.7 when dicts were unordered. Since 3.7, regular dicts maintain insertion order, but OrderedDict still offers unique features: move_to_end(key), equality that considers order, and popitem(last=True/False) for FIFO/LIFO behavior. It's the backbone of LRU cache implementations.

ChainMap groups multiple dicts into a single view. Lookups search each dict in order. It's ideal for layered configuration (defaults < config file < environment < CLI args), template variable scoping, and any scenario where you need to overlay multiple namespaces without copying.

Container When to Use Key Advantage
namedtuple Fixed-structure records Memory-efficient, immutable
deque FIFO queues, sliding windows O(1) both-end operations
defaultdict Grouping, counting, accumulating Auto-initialization
OrderedDict Order-sensitive equality, LRU caches move_to_end, ordered equality
ChainMap Layered lookups, scoped configs No-copy dict overlay
Counter Frequency counting, multisets Arithmetic on counts

💻 Code Example

codeTap to expand ⛶
1# ============================================================
2# namedtuple: lightweight immutable records
3# ============================================================
4from collections import namedtuple, deque, defaultdict, OrderedDict, ChainMap, Counter
5from typing import NamedTuple, Optional
6import time
7import json
8
9
10# Classic namedtuple
11HttpResponse = namedtuple("HttpResponse", ["status", "headers", "body"])
12
13resp = HttpResponse(status=200, headers={"Content-Type": "application/json"}, body='{}')
14print(resp.status) # 200
15print(resp[0]) # 200 (tuple indexing still works)
16status, headers, body = resp # Unpacking works
17
18# Convert to dict for serialization
19print(resp._asdict()) # {'status': 200, 'headers': {...}, 'body': '{}'}
20
21# Create modified copy (namedtuples are immutable)
22error_resp = resp._replace(status=404, body='{"error": "not found"}')
23print(error_resp.status) # 404
24print(resp.status) # 200 (original unchanged)
25
26
27# typing.NamedTuple — modern approach with type hints and defaults
28class DatabaseConfig(NamedTuple):
29 host: str
30 port: int = 5432
31 database: str = "myapp"
32 pool_min: int = 2
33 pool_max: int = 10
34
35 @property
36 def connection_string(self) -> str:
37 return f"postgresql://{self.host}:{self.port}/{self.database}"
38
39
40config = DatabaseConfig(host="prod-db-1", pool_max=20)
41print(config.connection_string) # postgresql://prod-db-1:5432/myapp
42print(config.pool_max) # 20
43
44
45# ============================================================
46# deque: high-performance double-ended queue
47# ============================================================
48
49# Bounded deque as a sliding window / recent history
50class RecentActivityTracker:
51 """Track the N most recent user actions."""
52
53 def __init__(self, max_size: int = 100):
54 self._actions: deque = deque(maxlen=max_size)
55
56 def record(self, action: str) -> None:
57 self._actions.append({"action": action, "timestamp": time.time()})
58
59 def recent(self, n: int = 10) -> list[dict]:
60 """Get n most recent actions (newest first)."""
61 return list(reversed(self._actions))[:n]
62
63 @property
64 def count(self) -> int:
65 return len(self._actions)
66
67
68tracker = RecentActivityTracker(max_size=5)
69for action in ["login", "view_page", "edit_profile", "upload", "logout", "login"]:
70 tracker.record(action)
71print(tracker.count) # 5 (oldest "login" was evicted)
72
73
74# Deque as efficient FIFO queue
75task_queue: deque = deque()
76task_queue.append("task_1") # Add to right (enqueue)
77task_queue.append("task_2")
78task_queue.append("task_3")
79next_task = task_queue.popleft() # Remove from left (dequeue)O(1)!
80print(next_task) # "task_1"
81
82# Rotate elements
83d = deque([1, 2, 3, 4, 5])
84d.rotate(2) # Rotate right by 2
85print(list(d)) # [4, 5, 1, 2, 3]
86d.rotate(-2) # Rotate left by 2
87print(list(d)) # [1, 2, 3, 4, 5]
88
89
90# ============================================================
91# defaultdict: automatic initialization patterns
92# ============================================================
93
94# Group items by category
95products = [
96 ("electronics", "laptop"),
97 ("clothing", "shirt"),
98 ("electronics", "phone"),
99 ("clothing", "jacket"),
100 ("electronics", "tablet"),
101 ("food", "apple"),
102]
103
104by_category = defaultdict(list)
105for category, product in products:
106 by_category[category].append(product)
107
108print(dict(by_category))
109# {'electronics': ['laptop', 'phone', 'tablet'],
110# 'clothing': ['shirt', 'jacket'],
111# 'food': ['apple']}
112
113
114# Count occurrences
115word_counts = defaultdict(int)
116words = "the cat sat on the mat the cat".split()
117for word in words:
118 word_counts[word] += 1
119print(dict(word_counts)) # {'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}
120
121
122# Nested defaultdict: auto-vivifying tree
123def tree():
124 """Create an auto-vivifying tree (infinite nesting)."""
125 return defaultdict(tree)
126
127
128config_tree = tree()
129config_tree["database"]["primary"]["host"] = "db-1.prod"
130config_tree["database"]["primary"]["port"] = 5432
131config_tree["database"]["replica"]["host"] = "db-2.prod"
132config_tree["cache"]["redis"]["host"] = "redis.prod"
133
134# Convert to regular dict for serialization
135def tree_to_dict(t):
136 if isinstance(t, defaultdict):
137 return {k: tree_to_dict(v) for k, v in t.items()}
138 return t
139
140print(json.dumps(tree_to_dict(config_tree), indent=2))
141
142
143# defaultdict with set for unique value tracking
144user_logins = defaultdict(set)
145login_events = [
146 ("alice", "192.168.1.1"),
147 ("bob", "10.0.0.1"),
148 ("alice", "192.168.1.2"),
149 ("alice", "192.168.1.1"), # Duplicate IP for alice
150 ("bob", "10.0.0.1"), # Duplicate IP for bob
151]
152
153for user, ip in login_events:
154 user_logins[user].add(ip)
155
156print(dict(user_logins))
157# {'alice': {'192.168.1.1', '192.168.1.2'}, 'bob': {'10.0.0.1'}}
158
159
160# ============================================================
161# Counter: frequency analysis and multiset operations
162# ============================================================
163
164# Word frequency analysis
165text = "to be or not to be that is the question"
166word_freq = Counter(text.split())
167print(word_freq.most_common(3)) # [('to', 2), ('be', 2), ('or', 1)]
168
169# Arithmetic on Counters (multiset operations)
170inventory_a = Counter(apples=5, oranges=3, bananas=2)
171inventory_b = Counter(apples=2, oranges=5, grapes=4)
172
173combined = inventory_a + inventory_b
174print(combined) # Counter({'oranges': 8, 'apples': 7, 'grapes': 4, 'bananas': 2})
175
176difference = inventory_a - inventory_b # Only positive counts kept
177print(difference) # Counter({'bananas': 2, 'apples': 3})
178
179# Intersection (minimum) and union (maximum)
180common = inventory_a & inventory_b
181print(common) # Counter({'oranges': 3, 'apples': 2})
182
183total = inventory_a | inventory_b
184print(total) # Counter({'oranges': 5, 'apples': 5, 'grapes': 4, 'bananas': 2})
185
186
187# ============================================================
188# OrderedDict: order-aware dictionary operations
189# ============================================================
190
191class LRUCache:
192 """Simple LRU cache using OrderedDict."""
193
194 def __init__(self, capacity: int):
195 self._cache: OrderedDict = OrderedDict()
196 self._capacity = capacity
197
198 def get(self, key: str) -> Optional[str]:
199 if key not in self._cache:
200 return None
201 self._cache.move_to_end(key) # Mark as recently used
202 return self._cache[key]
203
204 def put(self, key: str, value: str) -> None:
205 if key in self._cache:
206 self._cache.move_to_end(key)
207 self._cache[key] = value
208 if len(self._cache) > self._capacity:
209 self._cache.popitem(last=False) # Remove oldest (FIFO end)
210
211 def __repr__(self) -> str:
212 items = ", ".join(f"{k}={v}" for k, v in self._cache.items())
213 return f"LRUCache([{items}])"
214
215
216cache = LRUCache(capacity=3)
217cache.put("a", "1")
218cache.put("b", "2")
219cache.put("c", "3")
220print(cache) # LRUCache([a=1, b=2, c=3])
221
222cache.get("a") # Access "a" — moves to end
223cache.put("d", "4") # Evicts "b" (least recently used)
224print(cache) # LRUCache([c=3, a=1, d=4])
225
226
227# OrderedDict equality considers order
228od1 = OrderedDict([("a", 1), ("b", 2)])
229od2 = OrderedDict([("b", 2), ("a", 1)])
230print(od1 == od2) # False (different order)
231
232d1 = {"a": 1, "b": 2}
233d2 = {"b": 2, "a": 1}
234print(d1 == d2) # True (regular dicts ignore order in equality)
235
236
237# ============================================================
238# ChainMap: layered configuration
239# ============================================================
240
241# Layered configuration: CLI > env > config file > defaults
242defaults = {"debug": False, "log_level": "INFO", "port": 8080, "host": "0.0.0.0"}
243config_file = {"log_level": "WARNING", "port": 9090}
244env_vars = {"debug": True}
245cli_args = {"port": 3000}
246
247# Highest priority first
248settings = ChainMap(cli_args, env_vars, config_file, defaults)
249
250print(settings["port"]) # 3000 (from cli_args)
251print(settings["debug"]) # True (from env_vars)
252print(settings["log_level"]) # WARNING (from config_file)
253print(settings["host"]) # 0.0.0.0 (from defaults)
254
255# Mutations only affect the first mapping
256settings["new_key"] = "value"
257print(cli_args) # {'port': 3000, 'new_key': 'value'}
258
259# Create a child scope (useful for template engines, variable scoping)
260local_scope = settings.new_child({"port": 5000, "local_var": True})
261print(local_scope["port"]) # 5000 (local override)
262print(local_scope["debug"]) # True (inherited from parent chain)
263print(local_scope["local_var"]) # True
264
265
266# ============================================================
267# Practical: combining collections for a metrics aggregator
268# ============================================================
269
270class MetricsAggregator:
271 """Aggregate application metrics using collections."""
272
273 def __init__(self, window_size: int = 1000):
274 self._recent_latencies: deque = deque(maxlen=window_size)
275 self._error_counts: Counter = Counter()
276 self._endpoint_hits: defaultdict = defaultdict(int)
277 self._status_by_endpoint: defaultdict = defaultdict(Counter)
278
279 def record(self, endpoint: str, status: int, latency_ms: float) -> None:
280 self._recent_latencies.append(latency_ms)
281 self._endpoint_hits[endpoint] += 1
282 self._status_by_endpoint[endpoint][status] += 1
283 if status >= 400:
284 self._error_counts[f"{endpoint}:{status}"] += 1
285
286 def avg_latency(self) -> float:
287 if not self._recent_latencies:
288 return 0.0
289 return sum(self._recent_latencies) / len(self._recent_latencies)
290
291 def top_errors(self, n: int = 5) -> list:
292 return self._error_counts.most_common(n)
293
294 def summary(self) -> dict:
295 return {
296 "avg_latency_ms": round(self.avg_latency(), 2),
297 "total_requests": sum(self._endpoint_hits.values()),
298 "top_errors": self.top_errors(),
299 }
300
301
302metrics = MetricsAggregator(window_size=100)
303metrics.record("/api/users", 200, 45.2)
304metrics.record("/api/users", 200, 52.1)
305metrics.record("/api/orders", 500, 1200.5)
306metrics.record("/api/orders", 500, 980.3)
307metrics.record("/api/auth", 401, 12.0)
308print(metrics.summary())

🏋️ Practice Exercise

Exercises:

  1. Build a namedtuple-based LogEntry with fields timestamp, level, service, message. Write a parser that reads log lines into LogEntry instances, then use Counter to find the top 5 most common error messages and defaultdict to group entries by service.

  2. Implement a bounded task queue using deque(maxlen=N) that supports enqueue, dequeue, peek, and is_full. When the queue is full and a new item is enqueued, the oldest item should be silently dropped (not raise an error). Add a drain() method that yields all items.

  3. Create a ChainMap-based template variable resolver that supports nested scopes. Implement enter_scope(vars) (pushes a new scope), exit_scope() (pops the current scope), and resolve(name) (looks up a variable through all scopes). Test with three levels of nesting.

  4. Use defaultdict(lambda: defaultdict(list)) to build an inverted index: given a list of documents (strings), create a mapping from each word to the list of (doc_id, position) pairs where it appears. Then implement a search(word) function that returns matching documents with highlighted positions.

  5. Implement a complete LRU cache using OrderedDict that supports get, put, delete, resize (change capacity), and stats (hits, misses, evictions). Write unit tests that verify eviction order and hit/miss counts.

  6. Build a Counter-based shopping cart that supports adding items, removing items, merging two carts (+), finding common items between carts (&), and computing total price given a price lookup dict. Demonstrate all multiset operations.

⚠️ Common Mistakes

  • Using defaultdict when a regular dict with .setdefault() or .get(default) would suffice. defaultdict creates the default value on any missing key access, including accidental reads, which can silently populate the dict with unintended keys. Use dict.get(key, default) when you only want to read without side effects.

  • Forgetting that deque.maxlen silently discards elements from the opposite end when the deque is full. If you append to the right on a full deque, the leftmost element is dropped without warning. This is by design for sliding windows but can cause data loss if you expect an error.

  • Assuming OrderedDict is needed in Python 3.7+. Regular dicts maintain insertion order since 3.7. Only use OrderedDict when you need order-sensitive equality (OrderedDict([('a',1),('b',2)]) != OrderedDict([('b',2),('a',1)])), move_to_end(), or popitem(last=False) for FIFO behavior.

  • Mutating a namedtuple field directly. Namedtuples are immutable — point.x = 10 raises AttributeError. Use point._replace(x=10) to create a modified copy. If you need mutability, use a dataclass instead.

  • Not converting defaultdict back to a regular dict before serialization. json.dumps(defaultdict(...)) works, but the output type info is lost. More importantly, passing a defaultdict to code that doesn't expect auto-vivification can cause subtle bugs. Always dict(my_defaultdict) at boundaries.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for collections Module