Debugging & Profiling

0/4 in this phase0/54 across the roadmap

📖 Concept

Effective debugging separates productive developers from those who spend hours guessing. Python provides excellent built-in debugging tools, and understanding when and how to use each tool is critical for diagnosing issues in production systems.

Python's debugging toolkit:

Tool Purpose Use Case
pdb Interactive debugger (stdlib) Step through code, inspect state
breakpoint() Built-in function (3.7+) Drop into debugger anywhere
pdb++ / ipdb Enhanced debuggers Syntax highlighting, better UX
logging Structured log output Production debugging, audit trails
traceback Exception formatting Custom error reporting

pdb commands (the essential ones):

  • n (next) — execute current line, step over function calls
  • s (step) — step into function calls
  • c (continue) — run until next breakpoint
  • r (return) — run until current function returns
  • l (list) — show source code around current position
  • p expr — print expression value
  • pp expr — pretty-print expression
  • w (where) — show call stack
  • b lineno — set breakpoint at line number
  • cl (clear) — remove breakpoints

breakpoint() (Python 3.7+) is the modern way to invoke the debugger. It respects the PYTHONBREAKPOINT environment variable, allowing you to switch debuggers or disable breakpoints entirely without changing code:

  • PYTHONBREAKPOINT=0 — disable all breakpoints (production)
  • PYTHONBREAKPOINT=ipdb.set_trace — use ipdb instead of pdb
  • PYTHONBREAKPOINT=pudb.set_trace — use pudb (visual debugger)

Profiling identifies performance bottlenecks. Python offers multiple profilers:

  • cProfile — deterministic profiler (function-level), built-in, low overhead
  • profile — pure Python profiler (slower but extensible)
  • line_profiler — line-by-line execution time (third-party, essential for optimization)
  • memory_profiler — track memory allocation per line
  • py-spy — sampling profiler that attaches to running processes without code changes

Debugging strategy: Start with logging for context, use breakpoint() for interactive investigation, and profile only when you have confirmed a performance issue. Premature optimization guided by intuition rather than profiling data wastes time.

💻 Code Example

codeTap to expand ⛶
1# ============================================================
2# 1. pdb / breakpoint()Interactive Debugging
3# ============================================================
4import sys
5from pathlib import Path
6
7
8def calculate_discount(items, membership_level):
9 """
10 Production function with a subtle bug to debug.
11 Items: list of dicts with 'name', 'price', 'quantity'.
12 """
13 subtotal = sum(
14 item["price"] * item["quantity"] for item in items
15 )
16
17 # Drop into debugger to inspect state:
18 # breakpoint() # Uncomment to debug interactively
19
20 discount_rates = {
21 "bronze": 0.05,
22 "silver": 0.10,
23 "gold": 0.15,
24 "platinum": 0.20,
25 }
26
27 rate = discount_rates.get(membership_level, 0)
28 discount = subtotal * rate
29 total = subtotal - discount
30
31 return {
32 "subtotal": round(subtotal, 2),
33 "discount": round(discount, 2),
34 "total": round(total, 2),
35 "rate": rate,
36 }
37
38
39# ============================================================
40# 2. Conditional breakpoints and post-mortem debugging
41# ============================================================
42def find_anomalies(data):
43 """Process data with conditional debugging."""
44 results = []
45 for i, value in enumerate(data):
46 processed = value ** 0.5 if value >= 0 else None
47
48 # Conditional breakpoint — only pause on suspicious values
49 # if processed is not None and processed > 100:
50 # breakpoint()
51
52 results.append({"index": i, "original": value, "processed": processed})
53 return results
54
55
56def debug_with_post_mortem():
57 """
58 Post-mortem debugging: inspect state AFTER a crash.
59 Run with: python -m pdb script.py
60 When it crashes, pdb drops you into the frame where the
61 exception occurred.
62 """
63 data = [100, 200, -1, 400, 0]
64 try:
65 results = [1 / x for x in data]
66 except ZeroDivisionError:
67 import pdb
68 # pdb.post_mortem() # Uncomment to debug at crash site
69 print("ZeroDivisionError caught — would enter post-mortem debugger")
70
71
72# ============================================================
73# 3. Structured Logging (production debugging)
74# ============================================================
75import logging
76
77# Configure logging with structured format
78logging.basicConfig(
79 level=logging.DEBUG,
80 format="%(asctime)s [%(levelname)s] %(name)s:%(funcName)s:%(lineno)d — %(message)s",
81 datefmt="%Y-%m-%d %H:%M:%S",
82)
83logger = logging.getLogger(__name__)
84
85
86def process_order(order_id, items):
87 """Production code with proper logging levels."""
88 logger.info("Processing order %s with %d items", order_id, len(items))
89
90 for item in items:
91 logger.debug(
92 "Item: %s, price=%.2f, qty=%d",
93 item["name"],
94 item["price"],
95 item["quantity"],
96 )
97
98 try:
99 result = calculate_discount(items, "gold")
100 logger.info(
101 "Order %s total: $%.2f (discount: $%.2f)",
102 order_id,
103 result["total"],
104 result["discount"],
105 )
106 return result
107 except Exception as e:
108 logger.exception("Failed to process order %s", order_id)
109 raise
110
111
112# ============================================================
113# 4. Custom exception hooks and traceback formatting
114# ============================================================
115import traceback
116
117
118def robust_processor(data_batch):
119 """Collect errors without stopping the entire batch."""
120 results = []
121 errors = []
122
123 for i, item in enumerate(data_batch):
124 try:
125 processed = 100 / item["value"]
126 results.append({"index": i, "result": processed})
127 except (ZeroDivisionError, KeyError, TypeError) as e:
128 error_info = {
129 "index": i,
130 "item": item,
131 "error": str(e),
132 "traceback": traceback.format_exc(),
133 }
134 errors.append(error_info)
135 logger.warning("Error at index %d: %s", i, e)
136
137 if errors:
138 logger.warning(
139 "Batch completed with %d errors out of %d items",
140 len(errors),
141 len(data_batch),
142 )
143
144 return results, errors
145
146
147# ============================================================
148# 5. cProfile — Function-level profiling
149# ============================================================
150import cProfile
151import io
152import pstats
153
154
155def fibonacci(n):
156 """Deliberately unoptimized for profiling demonstration."""
157 if n <= 1:
158 return n
159 return fibonacci(n - 1) + fibonacci(n - 2)
160
161
162def profile_fibonacci():
163 """Profile with cProfile and display sorted results."""
164 profiler = cProfile.Profile()
165 profiler.enable()
166
167 result = fibonacci(30)
168
169 profiler.disable()
170
171 # Capture profile output
172 stream = io.StringIO()
173 stats = pstats.Stats(profiler, stream=stream)
174 stats.sort_stats("cumulative")
175 stats.print_stats(10) # top 10 functions
176 print(stream.getvalue())
177 print(f"Result: {result}")
178
179
180# Alternative: profile from command line
181# python -m cProfile -s cumulative my_script.py
182# python -m cProfile -o profile_output.prof my_script.py
183
184
185# ============================================================
186# 6. Timing utilities for targeted profiling
187# ============================================================
188import time
189from functools import wraps
190from contextlib import contextmanager
191
192
193def timer(func):
194 """Decorator to measure function execution time."""
195 @wraps(func)
196 def wrapper(*args, **kwargs):
197 start = time.perf_counter()
198 result = func(*args, **kwargs)
199 elapsed = time.perf_counter() - start
200 logger.info("%s executed in %.4f seconds", func.__name__, elapsed)
201 return result
202 return wrapper
203
204
205@contextmanager
206def timed_block(label="block"):
207 """Context manager for timing arbitrary code blocks."""
208 start = time.perf_counter()
209 yield
210 elapsed = time.perf_counter() - start
211 logger.info("%s completed in %.4f seconds", label, elapsed)
212
213
214@timer
215def sort_large_list():
216 """Example function to profile."""
217 import random
218 data = [random.randint(0, 1_000_000) for _ in range(500_000)]
219 return sorted(data)
220
221
222def demo_timed_block():
223 """Demonstrate context manager timing."""
224 with timed_block("list comprehension"):
225 squares = [x ** 2 for x in range(1_000_000)]
226
227 with timed_block("generator sum"):
228 total = sum(x ** 2 for x in range(1_000_000))
229
230
231# ============================================================
232# 7. tracemalloc — Memory profiling (stdlib)
233# ============================================================
234import tracemalloc
235
236
237def memory_profile_demo():
238 """Track memory allocations to find leaks."""
239 tracemalloc.start()
240
241 # Allocate some memory
242 data = [list(range(1000)) for _ in range(1000)]
243
244 snapshot = tracemalloc.take_snapshot()
245 top_stats = snapshot.statistics("lineno")
246
247 print("\nTop 5 memory allocations:")
248 for stat in top_stats[:5]:
249 print(f" {stat}")
250
251 current, peak = tracemalloc.get_traced_memory()
252 print(f"\nCurrent memory: {current / 1024:.1f} KB")
253 print(f"Peak memory: {peak / 1024:.1f} KB")
254
255 tracemalloc.stop()
256
257
258# ============================================================
259# Usage
260# ============================================================
261if __name__ == "__main__":
262 # Debugging demo
263 items = [
264 {"name": "Widget", "price": 25.99, "quantity": 3},
265 {"name": "Gadget", "price": 49.99, "quantity": 1},
266 {"name": "Doohickey", "price": 12.50, "quantity": 5},
267 ]
268 process_order("ORD-001", items)
269
270 # Profiling demo
271 profile_fibonacci()
272
273 # Timing demo
274 sort_large_list()
275 demo_timed_block()
276
277 # Memory demo
278 memory_profile_demo()

🏋️ Practice Exercise

Exercises:

  1. Write a function with a deliberate bug (e.g., off-by-one error in a loop). Use breakpoint() to step through execution with n, s, p, and l commands. Document each pdb command you used and what it revealed.

  2. Create a @timer decorator and a timed_block context manager. Apply them to three different algorithms for the same task (e.g., three sorting approaches) and compare their performance with formatted output.

  3. Use cProfile to profile a recursive Fibonacci function vs. a memoized version. Generate a sorted stats report and identify the hotspot. Then use functools.lru_cache and re-profile to show the improvement.

  4. Set up structured logging with different levels (DEBUG, INFO, WARNING, ERROR) in a multi-module application. Configure separate handlers: console for INFO+, file for DEBUG+. Demonstrate how to use logging for production debugging.

  5. Use tracemalloc to find a simulated memory leak: a function that appends to a module-level list on each call. Show the top memory allocations and explain how to identify and fix the leak.

  6. Configure PYTHONBREAKPOINT to use ipdb (install it first), then set it to 0 to disable all breakpoints. Explain how this mechanism lets you leave breakpoints in code without affecting production.

⚠️ Common Mistakes

  • Leaving breakpoint() or pdb.set_trace() calls in committed code. Use PYTHONBREAKPOINT=0 in production as a safety net, and add a pre-commit hook or linter rule to catch stray debugger statements.

  • Using print() statements instead of the logging module. Print statements are not configurable (no levels, no formatting, no routing), cannot be disabled in production, and clutter stdout. The logging module is designed for exactly this purpose.

  • Profiling before confirming there is actually a performance problem. Premature optimization wastes time. First measure with wall-clock timing, then profile only the slow paths. cProfile has overhead that can skew results for micro-benchmarks.

  • Ignoring the difference between time.time() and time.perf_counter() for benchmarking. perf_counter() uses the highest-resolution clock available and is not affected by system clock adjustments. Always use perf_counter() for measuring code execution time.

  • Not using post-mortem debugging (python -m pdb script.py or pdb.post_mortem()) for crashes. It drops you into the exact frame where the exception occurred, with all local variables intact — far more useful than reading a traceback.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Debugging & Profiling