Debugging & Profiling

0/4 in this phase0/54 across the roadmap

📖 Concept

Effective debugging separates productive developers from those who spend hours guessing. Python provides excellent built-in debugging tools, and understanding when and how to use each tool is critical for diagnosing issues in production systems.

Python's debugging toolkit:

Tool	Purpose	Use Case
`pdb`	Interactive debugger (stdlib)	Step through code, inspect state
`breakpoint()`	Built-in function (3.7+)	Drop into debugger anywhere
`pdb++` / `ipdb`	Enhanced debuggers	Syntax highlighting, better UX
`logging`	Structured log output	Production debugging, audit trails
`traceback`	Exception formatting	Custom error reporting

pdb commands (the essential ones):

n (next) — execute current line, step over function calls
s (step) — step into function calls
c (continue) — run until next breakpoint
r (return) — run until current function returns
l (list) — show source code around current position
p expr — print expression value
pp expr — pretty-print expression
w (where) — show call stack
b lineno — set breakpoint at line number
cl (clear) — remove breakpoints

breakpoint() (Python 3.7+) is the modern way to invoke the debugger. It respects the PYTHONBREAKPOINT environment variable, allowing you to switch debuggers or disable breakpoints entirely without changing code:

PYTHONBREAKPOINT=0 — disable all breakpoints (production)
PYTHONBREAKPOINT=ipdb.set_trace — use ipdb instead of pdb
PYTHONBREAKPOINT=pudb.set_trace — use pudb (visual debugger)

Profiling identifies performance bottlenecks. Python offers multiple profilers:

cProfile — deterministic profiler (function-level), built-in, low overhead
profile — pure Python profiler (slower but extensible)
line_profiler — line-by-line execution time (third-party, essential for optimization)
memory_profiler — track memory allocation per line
py-spy — sampling profiler that attaches to running processes without code changes

Debugging strategy: Start with logging for context, use breakpoint() for interactive investigation, and profile only when you have confirmed a performance issue. Premature optimization guided by intuition rather than profiling data wastes time.

💻 Code Example

codeTap to expand ⛶

1# ============================================================
2# 1. pdb / breakpoint() — Interactive Debugging
3# ============================================================
4import sys
5from pathlib import Path
6
7
8def calculate_discount(items, membership_level):
9    """
10    Production function with a subtle bug to debug.
11    Items: list of dicts with 'name', 'price', 'quantity'.
12    """
13    subtotal = sum(
14        item["price"] * item["quantity"] for item in items
15    )
16
17    # Drop into debugger to inspect state:
18    # breakpoint()  # Uncomment to debug interactively
19
20    discount_rates = {
21        "bronze": 0.05,
22        "silver": 0.10,
23        "gold": 0.15,
24        "platinum": 0.20,
25    }
26
27    rate = discount_rates.get(membership_level, 0)
28    discount = subtotal * rate
29    total = subtotal - discount
30
31    return {
32        "subtotal": round(subtotal, 2),
33        "discount": round(discount, 2),
34        "total": round(total, 2),
35        "rate": rate,
36    }
37
38
39# ============================================================
40# 2. Conditional breakpoints and post-mortem debugging
41# ============================================================
42def find_anomalies(data):
43    """Process data with conditional debugging."""
44    results = []
45    for i, value in enumerate(data):
46        processed = value ** 0.5 if value >= 0 else None
47
48        # Conditional breakpoint — only pause on suspicious values
49        # if processed is not None and processed > 100:
50        #     breakpoint()
51
52        results.append({"index": i, "original": value, "processed": processed})
53    return results
54
55
56def debug_with_post_mortem():
57    """
58    Post-mortem debugging: inspect state AFTER a crash.
59    Run with: python -m pdb script.py
60    When it crashes, pdb drops you into the frame where the
61    exception occurred.
62    """
63    data = [100, 200, -1, 400, 0]
64    try:
65        results = [1 / x for x in data]
66    except ZeroDivisionError:
67        import pdb
68        # pdb.post_mortem()  # Uncomment to debug at crash site
69        print("ZeroDivisionError caught — would enter post-mortem debugger")
70
71
72# ============================================================
73# 3. Structured Logging (production debugging)
74# ============================================================
75import logging
76
77# Configure logging with structured format
78logging.basicConfig(
79    level=logging.DEBUG,
80    format="%(asctime)s [%(levelname)s] %(name)s:%(funcName)s:%(lineno)d — %(message)s",
81    datefmt="%Y-%m-%d %H:%M:%S",
82)
83logger = logging.getLogger(__name__)
84
85
86def process_order(order_id, items):
87    """Production code with proper logging levels."""
88    logger.info("Processing order %s with %d items", order_id, len(items))
89
90    for item in items:
91        logger.debug(
92            "Item: %s, price=%.2f, qty=%d",
93            item["name"],
94            item["price"],
95            item["quantity"],
96        )
97
98    try:
99        result = calculate_discount(items, "gold")
100        logger.info(
101            "Order %s total: $%.2f (discount: $%.2f)",
102            order_id,
103            result["total"],
104            result["discount"],
105        )
106        return result
107    except Exception as e:
108        logger.exception("Failed to process order %s", order_id)
109        raise
110
111
112# ============================================================
113# 4. Custom exception hooks and traceback formatting
114# ============================================================
115import traceback
116
117
118def robust_processor(data_batch):
119    """Collect errors without stopping the entire batch."""
120    results = []
121    errors = []
122
123    for i, item in enumerate(data_batch):
124        try:
125            processed = 100 / item["value"]
126            results.append({"index": i, "result": processed})
127        except (ZeroDivisionError, KeyError, TypeError) as e:
128            error_info = {
129                "index": i,
130                "item": item,
131                "error": str(e),
132                "traceback": traceback.format_exc(),
133            }
134            errors.append(error_info)
135            logger.warning("Error at index %d: %s", i, e)
136
137    if errors:
138        logger.warning(
139            "Batch completed with %d errors out of %d items",
140            len(errors),
141            len(data_batch),
142        )
143
144    return results, errors
145
146
147# ============================================================
148# 5. cProfile — Function-level profiling
149# ============================================================
150import cProfile
151import io
152import pstats
153
154
155def fibonacci(n):
156    """Deliberately unoptimized for profiling demonstration."""
157    if n <= 1:
158        return n
159    return fibonacci(n - 1) + fibonacci(n - 2)
160
161
162def profile_fibonacci():
163    """Profile with cProfile and display sorted results."""
164    profiler = cProfile.Profile()
165    profiler.enable()
166
167    result = fibonacci(30)
168
169    profiler.disable()
170
171    # Capture profile output
172    stream = io.StringIO()
173    stats = pstats.Stats(profiler, stream=stream)
174    stats.sort_stats("cumulative")
175    stats.print_stats(10)  # top 10 functions
176    print(stream.getvalue())
177    print(f"Result: {result}")
178
179
180# Alternative: profile from command line
181# python -m cProfile -s cumulative my_script.py
182# python -m cProfile -o profile_output.prof my_script.py
183
184
185# ============================================================
186# 6. Timing utilities for targeted profiling
187# ============================================================
188import time
189from functools import wraps
190from contextlib import contextmanager
191
192
193def timer(func):
194    """Decorator to measure function execution time."""
195    @wraps(func)
196    def wrapper(*args, **kwargs):
197        start = time.perf_counter()
198        result = func(*args, **kwargs)
199        elapsed = time.perf_counter() - start
200        logger.info("%s executed in %.4f seconds", func.__name__, elapsed)
201        return result
202    return wrapper
203
204
205@contextmanager
206def timed_block(label="block"):
207    """Context manager for timing arbitrary code blocks."""
208    start = time.perf_counter()
209    yield
210    elapsed = time.perf_counter() - start
211    logger.info("%s completed in %.4f seconds", label, elapsed)
212
213
214@timer
215def sort_large_list():
216    """Example function to profile."""
217    import random
218    data = [random.randint(0, 1_000_000) for _ in range(500_000)]
219    return sorted(data)
220
221
222def demo_timed_block():
223    """Demonstrate context manager timing."""
224    with timed_block("list comprehension"):
225        squares = [x ** 2 for x in range(1_000_000)]
226
227    with timed_block("generator sum"):
228        total = sum(x ** 2 for x in range(1_000_000))
229
230
231# ============================================================
232# 7. tracemalloc — Memory profiling (stdlib)
233# ============================================================
234import tracemalloc
235
236
237def memory_profile_demo():
238    """Track memory allocations to find leaks."""
239    tracemalloc.start()
240
241    # Allocate some memory
242    data = [list(range(1000)) for _ in range(1000)]
243
244    snapshot = tracemalloc.take_snapshot()
245    top_stats = snapshot.statistics("lineno")
246
247    print("\nTop 5 memory allocations:")
248    for stat in top_stats[:5]:
249        print(f"  {stat}")
250
251    current, peak = tracemalloc.get_traced_memory()
252    print(f"\nCurrent memory: {current / 1024:.1f} KB")
253    print(f"Peak memory: {peak / 1024:.1f} KB")
254
255    tracemalloc.stop()
256
257
258# ============================================================
259# Usage
260# ============================================================
261if __name__ == "__main__":
262    # Debugging demo
263    items = [
264        {"name": "Widget", "price": 25.99, "quantity": 3},
265        {"name": "Gadget", "price": 49.99, "quantity": 1},
266        {"name": "Doohickey", "price": 12.50, "quantity": 5},
267    ]
268    process_order("ORD-001", items)
269
270    # Profiling demo
271    profile_fibonacci()
272
273    # Timing demo
274    sort_large_list()
275    demo_timed_block()
276
277    # Memory demo
278    memory_profile_demo()

🏋️ Practice Exercise

Exercises:

Write a function with a deliberate bug (e.g., off-by-one error in a loop). Use breakpoint() to step through execution with n, s, p, and l commands. Document each pdb command you used and what it revealed.
Create a @timer decorator and a timed_block context manager. Apply them to three different algorithms for the same task (e.g., three sorting approaches) and compare their performance with formatted output.
Use cProfile to profile a recursive Fibonacci function vs. a memoized version. Generate a sorted stats report and identify the hotspot. Then use functools.lru_cache and re-profile to show the improvement.
Set up structured logging with different levels (DEBUG, INFO, WARNING, ERROR) in a multi-module application. Configure separate handlers: console for INFO+, file for DEBUG+. Demonstrate how to use logging for production debugging.
Use tracemalloc to find a simulated memory leak: a function that appends to a module-level list on each call. Show the top memory allocations and explain how to identify and fix the leak.
Configure PYTHONBREAKPOINT to use ipdb (install it first), then set it to 0 to disable all breakpoints. Explain how this mechanism lets you leave breakpoints in code without affecting production.

⚠️ Common Mistakes

Leaving breakpoint() or pdb.set_trace() calls in committed code. Use PYTHONBREAKPOINT=0 in production as a safety net, and add a pre-commit hook or linter rule to catch stray debugger statements.
Using print() statements instead of the logging module. Print statements are not configurable (no levels, no formatting, no routing), cannot be disabled in production, and clutter stdout. The logging module is designed for exactly this purpose.
Profiling before confirming there is actually a performance problem. Premature optimization wastes time. First measure with wall-clock timing, then profile only the slow paths. cProfile has overhead that can skew results for micro-benchmarks.
Ignoring the difference between time.time() and time.perf_counter() for benchmarking. perf_counter() uses the highest-resolution clock available and is not affected by system clock adjustments. Always use perf_counter() for measuring code execution time.
Not using post-mortem debugging (python -m pdb script.py or pdb.post_mortem()) for crashes. It drops you into the exact frame where the exception occurred, with all local variables intact — far more useful than reading a traceback.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Debugging & Profiling

Was this topic helpful?

← PreviousMocking & Test Doubles Next →Code Quality & Tooling