collections Module
📖 Concept
The collections module provides specialized container datatypes that extend Python's built-in dict, list, set, and tuple with additional functionality. These are not exotic data structures — they are production workhorses that solve common problems more efficiently and expressively than rolling your own.
namedtuple creates lightweight, immutable, tuple subclasses with named fields. They use the same memory as regular tuples (no per-instance __dict__), support indexing and unpacking, and are perfect for representing records, coordinates, database rows, or any fixed-structure data. For new code, typing.NamedTuple provides the same functionality with type annotations.
deque (double-ended queue) is a thread-safe, O(1) append/pop from both ends. Lists are O(n) for insert(0, x) and pop(0) because they shift all elements. Deque with maxlen automatically evicts the oldest element when full — perfect for sliding windows, bounded buffers, and recent-history tracking.
defaultdict eliminates the "check if key exists, initialize if not" pattern. You provide a factory function (int, list, set, or a custom callable) that creates default values for missing keys. Nested defaultdicts enable auto-vivifying tree structures.
OrderedDict was essential before Python 3.7 when dicts were unordered. Since 3.7, regular dicts maintain insertion order, but OrderedDict still offers unique features: move_to_end(key), equality that considers order, and popitem(last=True/False) for FIFO/LIFO behavior. It's the backbone of LRU cache implementations.
ChainMap groups multiple dicts into a single view. Lookups search each dict in order. It's ideal for layered configuration (defaults < config file < environment < CLI args), template variable scoping, and any scenario where you need to overlay multiple namespaces without copying.
| Container | When to Use | Key Advantage |
|---|---|---|
namedtuple |
Fixed-structure records | Memory-efficient, immutable |
deque |
FIFO queues, sliding windows | O(1) both-end operations |
defaultdict |
Grouping, counting, accumulating | Auto-initialization |
OrderedDict |
Order-sensitive equality, LRU caches | move_to_end, ordered equality |
ChainMap |
Layered lookups, scoped configs | No-copy dict overlay |
Counter |
Frequency counting, multisets | Arithmetic on counts |
💻 Code Example
1# ============================================================2# namedtuple: lightweight immutable records3# ============================================================4from collections import namedtuple, deque, defaultdict, OrderedDict, ChainMap, Counter5from typing import NamedTuple, Optional6import time7import json8910# Classic namedtuple11HttpResponse = namedtuple("HttpResponse", ["status", "headers", "body"])1213resp = HttpResponse(status=200, headers={"Content-Type": "application/json"}, body='{}')14print(resp.status) # 20015print(resp[0]) # 200 (tuple indexing still works)16status, headers, body = resp # Unpacking works1718# Convert to dict for serialization19print(resp._asdict()) # {'status': 200, 'headers': {...}, 'body': '{}'}2021# Create modified copy (namedtuples are immutable)22error_resp = resp._replace(status=404, body='{"error": "not found"}')23print(error_resp.status) # 40424print(resp.status) # 200 (original unchanged)252627# typing.NamedTuple — modern approach with type hints and defaults28class DatabaseConfig(NamedTuple):29 host: str30 port: int = 543231 database: str = "myapp"32 pool_min: int = 233 pool_max: int = 103435 @property36 def connection_string(self) -> str:37 return f"postgresql://{self.host}:{self.port}/{self.database}"383940config = DatabaseConfig(host="prod-db-1", pool_max=20)41print(config.connection_string) # postgresql://prod-db-1:5432/myapp42print(config.pool_max) # 20434445# ============================================================46# deque: high-performance double-ended queue47# ============================================================4849# Bounded deque as a sliding window / recent history50class RecentActivityTracker:51 """Track the N most recent user actions."""5253 def __init__(self, max_size: int = 100):54 self._actions: deque = deque(maxlen=max_size)5556 def record(self, action: str) -> None:57 self._actions.append({"action": action, "timestamp": time.time()})5859 def recent(self, n: int = 10) -> list[dict]:60 """Get n most recent actions (newest first)."""61 return list(reversed(self._actions))[:n]6263 @property64 def count(self) -> int:65 return len(self._actions)666768tracker = RecentActivityTracker(max_size=5)69for action in ["login", "view_page", "edit_profile", "upload", "logout", "login"]:70 tracker.record(action)71print(tracker.count) # 5 (oldest "login" was evicted)727374# Deque as efficient FIFO queue75task_queue: deque = deque()76task_queue.append("task_1") # Add to right (enqueue)77task_queue.append("task_2")78task_queue.append("task_3")79next_task = task_queue.popleft() # Remove from left (dequeue) — O(1)!80print(next_task) # "task_1"8182# Rotate elements83d = deque([1, 2, 3, 4, 5])84d.rotate(2) # Rotate right by 285print(list(d)) # [4, 5, 1, 2, 3]86d.rotate(-2) # Rotate left by 287print(list(d)) # [1, 2, 3, 4, 5]888990# ============================================================91# defaultdict: automatic initialization patterns92# ============================================================9394# Group items by category95products = [96 ("electronics", "laptop"),97 ("clothing", "shirt"),98 ("electronics", "phone"),99 ("clothing", "jacket"),100 ("electronics", "tablet"),101 ("food", "apple"),102]103104by_category = defaultdict(list)105for category, product in products:106 by_category[category].append(product)107108print(dict(by_category))109# {'electronics': ['laptop', 'phone', 'tablet'],110# 'clothing': ['shirt', 'jacket'],111# 'food': ['apple']}112113114# Count occurrences115word_counts = defaultdict(int)116words = "the cat sat on the mat the cat".split()117for word in words:118 word_counts[word] += 1119print(dict(word_counts)) # {'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}120121122# Nested defaultdict: auto-vivifying tree123def tree():124 """Create an auto-vivifying tree (infinite nesting)."""125 return defaultdict(tree)126127128config_tree = tree()129config_tree["database"]["primary"]["host"] = "db-1.prod"130config_tree["database"]["primary"]["port"] = 5432131config_tree["database"]["replica"]["host"] = "db-2.prod"132config_tree["cache"]["redis"]["host"] = "redis.prod"133134# Convert to regular dict for serialization135def tree_to_dict(t):136 if isinstance(t, defaultdict):137 return {k: tree_to_dict(v) for k, v in t.items()}138 return t139140print(json.dumps(tree_to_dict(config_tree), indent=2))141142143# defaultdict with set for unique value tracking144user_logins = defaultdict(set)145login_events = [146 ("alice", "192.168.1.1"),147 ("bob", "10.0.0.1"),148 ("alice", "192.168.1.2"),149 ("alice", "192.168.1.1"), # Duplicate IP for alice150 ("bob", "10.0.0.1"), # Duplicate IP for bob151]152153for user, ip in login_events:154 user_logins[user].add(ip)155156print(dict(user_logins))157# {'alice': {'192.168.1.1', '192.168.1.2'}, 'bob': {'10.0.0.1'}}158159160# ============================================================161# Counter: frequency analysis and multiset operations162# ============================================================163164# Word frequency analysis165text = "to be or not to be that is the question"166word_freq = Counter(text.split())167print(word_freq.most_common(3)) # [('to', 2), ('be', 2), ('or', 1)]168169# Arithmetic on Counters (multiset operations)170inventory_a = Counter(apples=5, oranges=3, bananas=2)171inventory_b = Counter(apples=2, oranges=5, grapes=4)172173combined = inventory_a + inventory_b174print(combined) # Counter({'oranges': 8, 'apples': 7, 'grapes': 4, 'bananas': 2})175176difference = inventory_a - inventory_b # Only positive counts kept177print(difference) # Counter({'bananas': 2, 'apples': 3})178179# Intersection (minimum) and union (maximum)180common = inventory_a & inventory_b181print(common) # Counter({'oranges': 3, 'apples': 2})182183total = inventory_a | inventory_b184print(total) # Counter({'oranges': 5, 'apples': 5, 'grapes': 4, 'bananas': 2})185186187# ============================================================188# OrderedDict: order-aware dictionary operations189# ============================================================190191class LRUCache:192 """Simple LRU cache using OrderedDict."""193194 def __init__(self, capacity: int):195 self._cache: OrderedDict = OrderedDict()196 self._capacity = capacity197198 def get(self, key: str) -> Optional[str]:199 if key not in self._cache:200 return None201 self._cache.move_to_end(key) # Mark as recently used202 return self._cache[key]203204 def put(self, key: str, value: str) -> None:205 if key in self._cache:206 self._cache.move_to_end(key)207 self._cache[key] = value208 if len(self._cache) > self._capacity:209 self._cache.popitem(last=False) # Remove oldest (FIFO end)210211 def __repr__(self) -> str:212 items = ", ".join(f"{k}={v}" for k, v in self._cache.items())213 return f"LRUCache([{items}])"214215216cache = LRUCache(capacity=3)217cache.put("a", "1")218cache.put("b", "2")219cache.put("c", "3")220print(cache) # LRUCache([a=1, b=2, c=3])221222cache.get("a") # Access "a" — moves to end223cache.put("d", "4") # Evicts "b" (least recently used)224print(cache) # LRUCache([c=3, a=1, d=4])225226227# OrderedDict equality considers order228od1 = OrderedDict([("a", 1), ("b", 2)])229od2 = OrderedDict([("b", 2), ("a", 1)])230print(od1 == od2) # False (different order)231232d1 = {"a": 1, "b": 2}233d2 = {"b": 2, "a": 1}234print(d1 == d2) # True (regular dicts ignore order in equality)235236237# ============================================================238# ChainMap: layered configuration239# ============================================================240241# Layered configuration: CLI > env > config file > defaults242defaults = {"debug": False, "log_level": "INFO", "port": 8080, "host": "0.0.0.0"}243config_file = {"log_level": "WARNING", "port": 9090}244env_vars = {"debug": True}245cli_args = {"port": 3000}246247# Highest priority first248settings = ChainMap(cli_args, env_vars, config_file, defaults)249250print(settings["port"]) # 3000 (from cli_args)251print(settings["debug"]) # True (from env_vars)252print(settings["log_level"]) # WARNING (from config_file)253print(settings["host"]) # 0.0.0.0 (from defaults)254255# Mutations only affect the first mapping256settings["new_key"] = "value"257print(cli_args) # {'port': 3000, 'new_key': 'value'}258259# Create a child scope (useful for template engines, variable scoping)260local_scope = settings.new_child({"port": 5000, "local_var": True})261print(local_scope["port"]) # 5000 (local override)262print(local_scope["debug"]) # True (inherited from parent chain)263print(local_scope["local_var"]) # True264265266# ============================================================267# Practical: combining collections for a metrics aggregator268# ============================================================269270class MetricsAggregator:271 """Aggregate application metrics using collections."""272273 def __init__(self, window_size: int = 1000):274 self._recent_latencies: deque = deque(maxlen=window_size)275 self._error_counts: Counter = Counter()276 self._endpoint_hits: defaultdict = defaultdict(int)277 self._status_by_endpoint: defaultdict = defaultdict(Counter)278279 def record(self, endpoint: str, status: int, latency_ms: float) -> None:280 self._recent_latencies.append(latency_ms)281 self._endpoint_hits[endpoint] += 1282 self._status_by_endpoint[endpoint][status] += 1283 if status >= 400:284 self._error_counts[f"{endpoint}:{status}"] += 1285286 def avg_latency(self) -> float:287 if not self._recent_latencies:288 return 0.0289 return sum(self._recent_latencies) / len(self._recent_latencies)290291 def top_errors(self, n: int = 5) -> list:292 return self._error_counts.most_common(n)293294 def summary(self) -> dict:295 return {296 "avg_latency_ms": round(self.avg_latency(), 2),297 "total_requests": sum(self._endpoint_hits.values()),298 "top_errors": self.top_errors(),299 }300301302metrics = MetricsAggregator(window_size=100)303metrics.record("/api/users", 200, 45.2)304metrics.record("/api/users", 200, 52.1)305metrics.record("/api/orders", 500, 1200.5)306metrics.record("/api/orders", 500, 980.3)307metrics.record("/api/auth", 401, 12.0)308print(metrics.summary())
🏋️ Practice Exercise
Exercises:
Build a
namedtuple-basedLogEntrywith fieldstimestamp,level,service,message. Write a parser that reads log lines intoLogEntryinstances, then useCounterto find the top 5 most common error messages anddefaultdictto group entries by service.Implement a bounded task queue using
deque(maxlen=N)that supportsenqueue,dequeue,peek, andis_full. When the queue is full and a new item is enqueued, the oldest item should be silently dropped (not raise an error). Add adrain()method that yields all items.Create a
ChainMap-based template variable resolver that supports nested scopes. Implemententer_scope(vars)(pushes a new scope),exit_scope()(pops the current scope), andresolve(name)(looks up a variable through all scopes). Test with three levels of nesting.Use
defaultdict(lambda: defaultdict(list))to build an inverted index: given a list of documents (strings), create a mapping from each word to the list of (doc_id, position) pairs where it appears. Then implement asearch(word)function that returns matching documents with highlighted positions.Implement a complete LRU cache using
OrderedDictthat supportsget,put,delete,resize(change capacity), andstats(hits, misses, evictions). Write unit tests that verify eviction order and hit/miss counts.Build a
Counter-based shopping cart that supports adding items, removing items, merging two carts (+), finding common items between carts (&), and computing total price given a price lookup dict. Demonstrate all multiset operations.
⚠️ Common Mistakes
Using
defaultdictwhen a regular dict with.setdefault()or.get(default)would suffice.defaultdictcreates the default value on any missing key access, including accidental reads, which can silently populate the dict with unintended keys. Usedict.get(key, default)when you only want to read without side effects.Forgetting that
deque.maxlensilently discards elements from the opposite end when the deque is full. If you append to the right on a full deque, the leftmost element is dropped without warning. This is by design for sliding windows but can cause data loss if you expect an error.Assuming
OrderedDictis needed in Python 3.7+. Regular dicts maintain insertion order since 3.7. Only useOrderedDictwhen you need order-sensitive equality (OrderedDict([('a',1),('b',2)]) != OrderedDict([('b',2),('a',1)])),move_to_end(), orpopitem(last=False)for FIFO behavior.Mutating a
namedtuplefield directly. Namedtuples are immutable —point.x = 10raisesAttributeError. Usepoint._replace(x=10)to create a modified copy. If you need mutability, use adataclassinstead.Not converting
defaultdictback to a regulardictbefore serialization.json.dumps(defaultdict(...))works, but the output type info is lost. More importantly, passing adefaultdictto code that doesn't expect auto-vivification can cause subtle bugs. Alwaysdict(my_defaultdict)at boundaries.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for collections Module