Generators & Iterators
📖 Concept
Generators and iterators are at the heart of Python's approach to working with sequences of data. They enable lazy evaluation — producing values one at a time, on demand, rather than computing and storing an entire sequence in memory.
The Iterator Protocol:
Any object that implements __iter__() (returns the iterator) and __next__() (returns the next value or raises StopIteration) is an iterator. Every for loop in Python uses this protocol internally.
# What a for loop actually does:
iterator = iter(collection) # calls collection.__iter__()
while True:
try:
item = next(iterator) # calls iterator.__next__()
except StopIteration:
break
Generators are the easy way to create iterators. A function with yield becomes a generator function — calling it returns a generator object that implements the iterator protocol automatically.
| Feature | List | Generator |
|---|---|---|
| Memory | Stores all items | Stores one item at a time |
| Access | Random access (lst[i]) |
Sequential only |
| Reusability | Iterate multiple times | Single-pass (exhausted after one iteration) |
| Creation | [x for x in range(n)] |
(x for x in range(n)) |
yield vs return:
returnterminates the function and sends a value backyieldsuspends the function, saves its state, and produces a value. The function resumes from where it left off on the nextnext()call
Generator expressions are the generator equivalent of list comprehensions — use parentheses instead of brackets: (x*x for x in range(10)). They're ideal for feeding into functions that consume iterables: sum(x*x for x in range(10)).
itertools is Python's standard library module for composing efficient iterators. Key functions: chain, islice, groupby, product, combinations, count, cycle, repeat, and tee.
💻 Code Example
1# ============================================================2# Basic generator function3# ============================================================4def countdown(n: int):5 """Yield numbers from n down to 1."""6 print(f"Starting countdown from {n}")7 while n > 0:8 yield n # Suspend here, resume on next()9 n -= 110 print("Countdown complete!") # Runs after final next()1112# Generator returns a generator object (NOT a value)13gen = countdown(5)14print(type(gen)) # <class 'generator'>1516print(next(gen)) # "Starting countdown from 5" then 517print(next(gen)) # 4 (resumes after the yield)18print(next(gen)) # 31920# Exhaust the rest with a for loop21for val in gen:22 print(val) # 2, 1, then "Countdown complete!"232425# ============================================================26# Generator for memory-efficient data processing27# ============================================================28def read_large_file(file_path: str, chunk_size: int = 8192):29 """Read a large file in chunks without loading it all into memory."""30 with open(file_path, "r") as f:31 while True:32 chunk = f.read(chunk_size)33 if not chunk:34 break35 yield chunk363738def grep_lines(lines, pattern: str):39 """Filter lines matching a pattern (generator pipeline)."""40 for line in lines:41 if pattern in line:42 yield line434445def line_reader(file_path: str):46 """Yield individual lines from a file."""47 with open(file_path, "r") as f:48 for line in f:49 yield line.rstrip("\n")505152# Pipeline: read -> filter -> process (all lazy, constant memory)53# matching = grep_lines(line_reader("server.log"), "ERROR")54# for line in matching:55# process(line)565758# ============================================================59# Generator expressions vs list comprehensions60# ============================================================61# List comprehension: builds the ENTIRE list in memory62squares_list = [x * x for x in range(1_000_000)] # ~8MB in memory6364# Generator expression: produces values one at a time65squares_gen = (x * x for x in range(1_000_000)) # ~120 bytes!6667# Use generators when you only need to iterate once68total = sum(x * x for x in range(1_000_000)) # No extra memory697071# ============================================================72# Custom iterator class73# ============================================================74class FibonacciIterator:75 """Infinite Fibonacci sequence iterator."""7677 def __init__(self):78 self._a = 079 self._b = 18081 def __iter__(self):82 return self # Iterator returns itself8384 def __next__(self):85 value = self._a86 self._a, self._b = self._b, self._a + self._b87 return value888990# Take first 10 Fibonacci numbers91from itertools import islice9293fib = FibonacciIterator()94first_10 = list(islice(fib, 10))95print(first_10) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]969798# ============================================================99# yield from (delegation to sub-generators)100# ============================================================101def flatten(nested):102 """Recursively flatten nested iterables."""103 for item in nested:104 if hasattr(item, "__iter__") and not isinstance(item, (str, bytes)):105 yield from flatten(item) # Delegate to sub-generator106 else:107 yield item108109data = [1, [2, 3], [4, [5, 6]], [[7, 8], 9]]110print(list(flatten(data))) # [1, 2, 3, 4, 5, 6, 7, 8, 9]111112113# ============================================================114# Two-way communication with send()115# ============================================================116def accumulator():117 """Generator that accumulates sent values."""118 total = 0119 while True:120 value = yield total # Receive value via send(), yield current total121 if value is None:122 break123 total += value124125acc = accumulator()126next(acc) # Prime the generator (advance to first yield)127print(acc.send(10)) # 10128print(acc.send(20)) # 30129print(acc.send(5)) # 35130131132# ============================================================133# Practical itertools usage134# ============================================================135import itertools136137# chain: concatenate iterables138combined = list(itertools.chain([1, 2], [3, 4], [5]))139print(combined) # [1, 2, 3, 4, 5]140141# groupby: group consecutive elements by key142data = [143 {"dept": "eng", "name": "Alice"},144 {"dept": "eng", "name": "Bob"},145 {"dept": "sales", "name": "Charlie"},146 {"dept": "sales", "name": "Diana"},147]148# Data MUST be sorted by the key first!149for dept, members in itertools.groupby(data, key=lambda x: x["dept"]):150 print(f"{dept}: {[m['name'] for m in members]}")151# eng: ['Alice', 'Bob']152# sales: ['Charlie', 'Diana']153154# product: cartesian product155sizes = ["S", "M", "L"]156colors = ["red", "blue"]157variants = list(itertools.product(sizes, colors))158# [('S','red'), ('S','blue'), ('M','red'), ('M','blue'), ('L','red'), ('L','blue')]159160# islice: slice an infinite iterator161evens = (x for x in itertools.count(0, 2)) # 0, 2, 4, 6, ...162first_five_evens = list(itertools.islice(evens, 5))163print(first_five_evens) # [0, 2, 4, 6, 8]164165# tee: duplicate an iterator166original = iter(range(5))167copy1, copy2 = itertools.tee(original, 2)168print(list(copy1)) # [0, 1, 2, 3, 4]169print(list(copy2)) # [0, 1, 2, 3, 4]
🏋️ Practice Exercise
Exercises:
Write a generator
chunked(iterable, size)that yields successive chunks (as lists) ofsizeelements from any iterable. The last chunk may be shorter. Test it with both lists and other generators.Implement a custom
Rangeclass (not using built-inrange) that supports__iter__,__next__,__len__,__contains__, and__reversed__. It should handle start, stop, step (including negative step).Build a generator pipeline for log analysis:
read_logs(path)->parse_entries(lines)(yield dicts) ->filter_errors(entries)->aggregate_by_hour(errors). Each stage should be a separate generator that feeds into the next.Create a generator
interleave(*iterables)that yields one element from each iterable in round-robin fashion, stopping when all are exhausted. Handle iterables of different lengths gracefully.Implement a
@coroutinedecorator that automatically primes a generator (callsnext()on it). Then write a generator-based coroutine that receives strings viasend(), accumulates them, and yields the running concatenation.Use
itertoolsto solve: given a list of numbers, find all unique pairs that sum to a target value. Compare the generator approach vs. a set-based approach in terms of memory and time complexity.
⚠️ Common Mistakes
Trying to iterate over a generator twice. Generators are single-pass — once exhausted, calling
next()raisesStopIterationforever. To iterate multiple times, either recreate the generator or useitertools.tee()(but be awareteestores elements in memory).Using
return valuein a generator function and expecting it as output. In generators,return valueraisesStopIteration(value)— the value is stored in the exception's.valueattribute, not yielded. Useyield valueto produce output.Forgetting to prime a generator-based coroutine before calling
send(). The first call must benext(gen)orgen.send(None)to advance to the firstyield. Sending a non-None value to a just-started generator raisesTypeError.Using
itertools.groupby()on unsorted data and expecting it to group all matching elements.groupbyonly groups consecutive elements with the same key. Sort the data by the key first, or usecollections.defaultdictfor non-consecutive grouping.Consuming an iterator inside a function that's supposed to pass it along. Operations like
list(),len()(if supported), or evenif iteratorexhaust the iterator. Useitertools.tee()if you need to inspect and pass along.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Generators & Iterators