Generators & Iterators

0/5 in this phase0/54 across the roadmap

📖 Concept

Generators and iterators are at the heart of Python's approach to working with sequences of data. They enable lazy evaluation — producing values one at a time, on demand, rather than computing and storing an entire sequence in memory.

The Iterator Protocol: Any object that implements __iter__() (returns the iterator) and __next__() (returns the next value or raises StopIteration) is an iterator. Every for loop in Python uses this protocol internally.

# What a for loop actually does:
iterator = iter(collection)    # calls collection.__iter__()
while True:
    try:
        item = next(iterator)  # calls iterator.__next__()
    except StopIteration:
        break

Generators are the easy way to create iterators. A function with yield becomes a generator function — calling it returns a generator object that implements the iterator protocol automatically.

Feature List Generator
Memory Stores all items Stores one item at a time
Access Random access (lst[i]) Sequential only
Reusability Iterate multiple times Single-pass (exhausted after one iteration)
Creation [x for x in range(n)] (x for x in range(n))

yield vs return:

  • return terminates the function and sends a value back
  • yield suspends the function, saves its state, and produces a value. The function resumes from where it left off on the next next() call

Generator expressions are the generator equivalent of list comprehensions — use parentheses instead of brackets: (x*x for x in range(10)). They're ideal for feeding into functions that consume iterables: sum(x*x for x in range(10)).

itertools is Python's standard library module for composing efficient iterators. Key functions: chain, islice, groupby, product, combinations, count, cycle, repeat, and tee.

💻 Code Example

codeTap to expand ⛶
1# ============================================================
2# Basic generator function
3# ============================================================
4def countdown(n: int):
5 """Yield numbers from n down to 1."""
6 print(f"Starting countdown from {n}")
7 while n > 0:
8 yield n # Suspend here, resume on next()
9 n -= 1
10 print("Countdown complete!") # Runs after final next()
11
12# Generator returns a generator object (NOT a value)
13gen = countdown(5)
14print(type(gen)) # <class 'generator'>
15
16print(next(gen)) # "Starting countdown from 5" then 5
17print(next(gen)) # 4 (resumes after the yield)
18print(next(gen)) # 3
19
20# Exhaust the rest with a for loop
21for val in gen:
22 print(val) # 2, 1, then "Countdown complete!"
23
24
25# ============================================================
26# Generator for memory-efficient data processing
27# ============================================================
28def read_large_file(file_path: str, chunk_size: int = 8192):
29 """Read a large file in chunks without loading it all into memory."""
30 with open(file_path, "r") as f:
31 while True:
32 chunk = f.read(chunk_size)
33 if not chunk:
34 break
35 yield chunk
36
37
38def grep_lines(lines, pattern: str):
39 """Filter lines matching a pattern (generator pipeline)."""
40 for line in lines:
41 if pattern in line:
42 yield line
43
44
45def line_reader(file_path: str):
46 """Yield individual lines from a file."""
47 with open(file_path, "r") as f:
48 for line in f:
49 yield line.rstrip("\n")
50
51
52# Pipeline: read -> filter -> process (all lazy, constant memory)
53# matching = grep_lines(line_reader("server.log"), "ERROR")
54# for line in matching:
55# process(line)
56
57
58# ============================================================
59# Generator expressions vs list comprehensions
60# ============================================================
61# List comprehension: builds the ENTIRE list in memory
62squares_list = [x * x for x in range(1_000_000)] # ~8MB in memory
63
64# Generator expression: produces values one at a time
65squares_gen = (x * x for x in range(1_000_000)) # ~120 bytes!
66
67# Use generators when you only need to iterate once
68total = sum(x * x for x in range(1_000_000)) # No extra memory
69
70
71# ============================================================
72# Custom iterator class
73# ============================================================
74class FibonacciIterator:
75 """Infinite Fibonacci sequence iterator."""
76
77 def __init__(self):
78 self._a = 0
79 self._b = 1
80
81 def __iter__(self):
82 return self # Iterator returns itself
83
84 def __next__(self):
85 value = self._a
86 self._a, self._b = self._b, self._a + self._b
87 return value
88
89
90# Take first 10 Fibonacci numbers
91from itertools import islice
92
93fib = FibonacciIterator()
94first_10 = list(islice(fib, 10))
95print(first_10) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
96
97
98# ============================================================
99# yield from (delegation to sub-generators)
100# ============================================================
101def flatten(nested):
102 """Recursively flatten nested iterables."""
103 for item in nested:
104 if hasattr(item, "__iter__") and not isinstance(item, (str, bytes)):
105 yield from flatten(item) # Delegate to sub-generator
106 else:
107 yield item
108
109data = [1, [2, 3], [4, [5, 6]], [[7, 8], 9]]
110print(list(flatten(data))) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
111
112
113# ============================================================
114# Two-way communication with send()
115# ============================================================
116def accumulator():
117 """Generator that accumulates sent values."""
118 total = 0
119 while True:
120 value = yield total # Receive value via send(), yield current total
121 if value is None:
122 break
123 total += value
124
125acc = accumulator()
126next(acc) # Prime the generator (advance to first yield)
127print(acc.send(10)) # 10
128print(acc.send(20)) # 30
129print(acc.send(5)) # 35
130
131
132# ============================================================
133# Practical itertools usage
134# ============================================================
135import itertools
136
137# chain: concatenate iterables
138combined = list(itertools.chain([1, 2], [3, 4], [5]))
139print(combined) # [1, 2, 3, 4, 5]
140
141# groupby: group consecutive elements by key
142data = [
143 {"dept": "eng", "name": "Alice"},
144 {"dept": "eng", "name": "Bob"},
145 {"dept": "sales", "name": "Charlie"},
146 {"dept": "sales", "name": "Diana"},
147]
148# Data MUST be sorted by the key first!
149for dept, members in itertools.groupby(data, key=lambda x: x["dept"]):
150 print(f"{dept}: {[m['name'] for m in members]}")
151# eng: ['Alice', 'Bob']
152# sales: ['Charlie', 'Diana']
153
154# product: cartesian product
155sizes = ["S", "M", "L"]
156colors = ["red", "blue"]
157variants = list(itertools.product(sizes, colors))
158# [('S','red'), ('S','blue'), ('M','red'), ('M','blue'), ('L','red'), ('L','blue')]
159
160# islice: slice an infinite iterator
161evens = (x for x in itertools.count(0, 2)) # 0, 2, 4, 6, ...
162first_five_evens = list(itertools.islice(evens, 5))
163print(first_five_evens) # [0, 2, 4, 6, 8]
164
165# tee: duplicate an iterator
166original = iter(range(5))
167copy1, copy2 = itertools.tee(original, 2)
168print(list(copy1)) # [0, 1, 2, 3, 4]
169print(list(copy2)) # [0, 1, 2, 3, 4]

🏋️ Practice Exercise

Exercises:

  1. Write a generator chunked(iterable, size) that yields successive chunks (as lists) of size elements from any iterable. The last chunk may be shorter. Test it with both lists and other generators.

  2. Implement a custom Range class (not using built-in range) that supports __iter__, __next__, __len__, __contains__, and __reversed__. It should handle start, stop, step (including negative step).

  3. Build a generator pipeline for log analysis: read_logs(path) -> parse_entries(lines) (yield dicts) -> filter_errors(entries) -> aggregate_by_hour(errors). Each stage should be a separate generator that feeds into the next.

  4. Create a generator interleave(*iterables) that yields one element from each iterable in round-robin fashion, stopping when all are exhausted. Handle iterables of different lengths gracefully.

  5. Implement a @coroutine decorator that automatically primes a generator (calls next() on it). Then write a generator-based coroutine that receives strings via send(), accumulates them, and yields the running concatenation.

  6. Use itertools to solve: given a list of numbers, find all unique pairs that sum to a target value. Compare the generator approach vs. a set-based approach in terms of memory and time complexity.

⚠️ Common Mistakes

  • Trying to iterate over a generator twice. Generators are single-pass — once exhausted, calling next() raises StopIteration forever. To iterate multiple times, either recreate the generator or use itertools.tee() (but be aware tee stores elements in memory).

  • Using return value in a generator function and expecting it as output. In generators, return value raises StopIteration(value) — the value is stored in the exception's .value attribute, not yielded. Use yield value to produce output.

  • Forgetting to prime a generator-based coroutine before calling send(). The first call must be next(gen) or gen.send(None) to advance to the first yield. Sending a non-None value to a just-started generator raises TypeError.

  • Using itertools.groupby() on unsorted data and expecting it to group all matching elements. groupby only groups consecutive elements with the same key. Sort the data by the key first, or use collections.defaultdict for non-consecutive grouping.

  • Consuming an iterator inside a function that's supposed to pass it along. Operations like list(), len() (if supported), or even if iterator exhaust the iterator. Use itertools.tee() if you need to inspect and pass along.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Generators & Iterators