Memory Management
📖 Concept
Understanding Python's memory management is essential for writing efficient programs, especially when dealing with large datasets, long-running services, or memory-constrained environments. CPython uses a combination of reference counting and a cyclic garbage collector to manage memory automatically.
Reference counting is the primary mechanism. Every Python object has a reference count — an integer tracking how many variables, containers, or attributes point to it. When an object's reference count drops to zero, CPython immediately deallocates it. You can inspect reference counts with sys.getrefcount() (note: calling the function itself adds a temporary reference).
Cyclic garbage collector handles reference cycles — situations where objects reference each other, keeping their reference counts above zero even when no external references exist. The GC uses a generational algorithm with three generations (0, 1, 2). New objects start in generation 0. Objects that survive a collection are promoted to the next generation. Higher generations are collected less frequently, based on the observation that long-lived objects tend to stay alive.
__slots__ is a class-level optimization that replaces the per-instance __dict__ (a hash table) with a fixed-size array of slot descriptors. This reduces memory per instance by 40-60% for classes with few attributes and also speeds up attribute access. The tradeoff: you cannot dynamically add attributes not listed in __slots__.
weakref provides references to objects that do not increase the reference count. When the referent is garbage-collected, the weak reference returns None (or calls a callback). This is critical for caches, observer patterns, and parent-child relationships where you want to avoid preventing garbage collection.
Common memory leak patterns:
- Circular references with
__del__methods (pre-Python 3.4 could not collect these) - Unbounded caches — dictionaries or lists that grow without limit
- Global-scope accumulation — appending to module-level lists in loops or requests
- Closures capturing large objects — inner functions retaining references to large enclosing-scope variables
- C extension objects — objects allocated by C libraries that Python's GC cannot track
Measuring object size: sys.getsizeof() returns the memory consumed by a single object (not counting nested objects). For deep size measurement, use pympler.asizeof() or manually walk the object graph with gc.get_referents().
💻 Code Example
1# ============================================================2# 1. Reference counting basics3# ============================================================4import sys5import gc678def reference_counting_demo():9 """Demonstrate how reference counting works in CPython."""10 a = [1, 2, 3] # refcount = 1 (variable 'a')11 print(f"After creation : refcount = {sys.getrefcount(a) - 1}")12 # Note: getrefcount adds 1 for the function argument itself1314 b = a # refcount = 2 (a and b point to same list)15 print(f"After b = a : refcount = {sys.getrefcount(a) - 1}")1617 c = [a, a, a] # refcount = 5 (a, b, and 3 slots in c)18 print(f"After c = [a,a,a] : refcount = {sys.getrefcount(a) - 1}")1920 del c # refcount = 2 (removed 3 references from c)21 print(f"After del c : refcount = {sys.getrefcount(a) - 1}")2223 del b # refcount = 124 print(f"After del b : refcount = {sys.getrefcount(a) - 1}")25 # When refcount hits 0, memory is freed immediately262728# ============================================================29# 2. Garbage collector — handling reference cycles30# ============================================================31def gc_demo():32 """Demonstrate cyclic garbage collection."""33 gc.collect() # Clear any pending garbage34 gc.set_debug(gc.DEBUG_STATS)3536 # Create a reference cycle37 class Node:38 def __init__(self, name):39 self.name = name40 self.ref = None4142 a = Node("A")43 b = Node("B")44 a.ref = b # A -> B45 b.ref = a # B -> A (cycle!)4647 # Remove external references48 del a49 del b50 # Refcounts are still 1 (due to the cycle), so reference51 # counting alone cannot free them5253 # Force garbage collection — the cyclic GC detects and frees them54 collected = gc.collect()55 print(f"\nGC collected {collected} objects (including the cycle)")5657 # Inspect GC generations58 print(f"GC thresholds: {gc.get_threshold()}")59 print(f"GC counts : {gc.get_count()}")6061 gc.set_debug(0) # Reset debug626364# ============================================================65# 3. __slots__ — Memory-efficient classes66# ============================================================67class PointRegular:68 """Regular class — uses __dict__ for attribute storage."""69 def __init__(self, x, y, z):70 self.x = x71 self.y = y72 self.z = z737475class PointSlots:76 """Slotted class — fixed attribute storage, no __dict__."""77 __slots__ = ("x", "y", "z")7879 def __init__(self, x, y, z):80 self.x = x81 self.y = y82 self.z = z838485def slots_comparison():86 """Compare memory usage of regular vs slotted classes."""87 regular = PointRegular(1.0, 2.0, 3.0)88 slotted = PointSlots(1.0, 2.0, 3.0)8990 regular_size = sys.getsizeof(regular) + sys.getsizeof(regular.__dict__)91 slotted_size = sys.getsizeof(slotted)9293 print(f"\nRegular class size: {regular_size} bytes")94 print(f"Slotted class size: {slotted_size} bytes")95 print(f"Savings : {regular_size - slotted_size} bytes "96 f"({(1 - slotted_size/regular_size)*100:.0f}%)")9798 # Scale test: create 1 million instances99 import tracemalloc100 tracemalloc.start()101102 regular_list = [PointRegular(i, i+1, i+2) for i in range(100_000)]103 regular_mem = tracemalloc.get_traced_memory()[0]104105 tracemalloc.stop()106 tracemalloc.start()107108 slotted_list = [PointSlots(i, i+1, i+2) for i in range(100_000)]109 slotted_mem = tracemalloc.get_traced_memory()[0]110111 tracemalloc.stop()112113 print(f"\n100K regular instances: {regular_mem / 1024 / 1024:.1f} MB")114 print(f"100K slotted instances: {slotted_mem / 1024 / 1024:.1f} MB")115116 # Clean up117 del regular_list, slotted_list118119120# ============================================================121# 4. weakref — Non-preventing references122# ============================================================123import weakref124125126class ExpensiveResource:127 """Simulates a large cached object."""128 def __init__(self, name, data_size=1_000_000):129 self.name = name130 self.data = bytearray(data_size) # ~1MB131132 def __repr__(self):133 return f"ExpensiveResource({self.name!r})"134135 def __del__(self):136 print(f" [GC] {self.name} deallocated")137138139def weakref_demo():140 """Demonstrate weak references for caching."""141 print("\n--- weakref demo ---")142143 # Strong reference keeps object alive144 resource = ExpensiveResource("primary")145 weak = weakref.ref(resource)146147 print(f"Weak ref alive: {weak() is not None}") # True148 print(f"Object: {weak()}")149150 # Delete strong reference — object is freed151 del resource152 print(f"After del: weak ref alive: {weak() is not None}") # False153154155class WeakCache:156 """Cache that does not prevent garbage collection."""157158 def __init__(self):159 self._cache = weakref.WeakValueDictionary()160 self._stats = {"hits": 0, "misses": 0}161162 def get_or_create(self, key, factory):163 """Retrieve from cache or create with factory function."""164 obj = self._cache.get(key)165 if obj is not None:166 self._stats["hits"] += 1167 return obj168169 self._stats["misses"] += 1170 obj = factory(key)171 self._cache[key] = obj172 return obj173174 @property175 def stats(self):176 return {**self._stats, "cached": len(self._cache)}177178179def weak_cache_demo():180 """Demonstrate WeakValueDictionary cache."""181 print("\n--- WeakCache demo ---")182 cache = WeakCache()183184 # Create and cache resources185 refs = []186 for name in ["alpha", "beta", "gamma"]:187 resource = cache.get_or_create(name, ExpensiveResource)188 refs.append(resource)189190 print(f"Cache stats: {cache.stats}")191192 # Access cached items (hits)193 for name in ["alpha", "beta"]:194 cache.get_or_create(name, ExpensiveResource)195196 print(f"After re-access: {cache.stats}")197198 # Release some strong references — objects may be GC'd199 del refs[0] # Release "alpha"200 gc.collect()201 print(f"After releasing alpha: {cache.stats}")202203204# ============================================================205# 5. Memory leak detection with tracemalloc206# ============================================================207import tracemalloc208209210def detect_memory_leak():211 """Use tracemalloc to find memory leaks."""212 tracemalloc.start()213 snapshot1 = tracemalloc.take_snapshot()214215 # Simulate a memory leak: data accumulates in a global list216 leaked_data = []217 for i in range(1000):218 leaked_data.append(" " * 10_000) # 10KB strings219220 snapshot2 = tracemalloc.take_snapshot()221222 # Compare snapshots to find where memory grew223 stats = snapshot2.compare_to(snapshot1, "lineno")224225 print("\nTop 5 memory increases:")226 for stat in stats[:5]:227 print(f" {stat}")228229 current, peak = tracemalloc.get_traced_memory()230 print(f"\nCurrent: {current / 1024 / 1024:.2f} MB")231 print(f"Peak : {peak / 1024 / 1024:.2f} MB")232233 tracemalloc.stop()234235236# ============================================================237# 6. Object size inspection238# ============================================================239def inspect_object_sizes():240 """Show memory consumed by different Python objects."""241 objects = {242 "int (0)": 0,243 "int (1)": 1,244 "int (2**30)": 2**30,245 "float": 3.14,246 "str (empty)": "",247 "str (10 chars)": "0123456789",248 "bytes (empty)": b"",249 "bytes (10)": b"0123456789",250 "list (empty)": [],251 "list (10 ints)": list(range(10)),252 "dict (empty)": {},253 "dict (10 items)": {i: i for i in range(10)},254 "set (empty)": set(),255 "set (10 items)": set(range(10)),256 "tuple (empty)": (),257 "tuple (10)": tuple(range(10)),258 }259260 print("\nObject sizes (sys.getsizeof):")261 print(f" {'Type':<20} {'Size (bytes)':>12}")262 print(f" {'-'*20} {'-'*12}")263 for label, obj in objects.items():264 size = sys.getsizeof(obj)265 print(f" {label:<20} {size:>12}")266267268# ============================================================269# Usage270# ============================================================271if __name__ == "__main__":272 reference_counting_demo()273 gc_demo()274 slots_comparison()275 weakref_demo()276 weak_cache_demo()277 detect_memory_leak()278 inspect_object_sizes()
🏋️ Practice Exercise
Exercises:
Create a class with and without
__slots__. Instantiate 1 million objects of each and measure memory usage withtracemalloc. Calculate the per-instance memory savings. Try adding a dynamic attribute to the slotted class and observe theAttributeError.Build a reference cycle detector: write a function that creates objects with circular references, then use
gc.get_referrers()andgc.get_referents()to trace the cycle. Force collection withgc.collect()and verify the objects were freed.Implement a
WeakCacheclass usingweakref.WeakValueDictionary. Create cached objects, hold strong references to some, delete others, and verify that the cache automatically shrinks. Add hit/miss statistics.Use
tracemallocsnapshot comparison to find a simulated memory leak: write a function that appends to a module-level list on each call. Take snapshots before and after 1000 calls, compare them, and identify the leaking line.Write a function that measures the deep size of a nested data structure (dict of lists of dicts) by recursively walking
gc.get_referents()and summingsys.getsizeof(). Compare your result withpympler.asizeof()if available.Experiment with
gc.set_threshold()to change garbage collection frequency. Create a program that generates many short-lived reference cycles. Measure performance with different threshold settings and explain the tradeoff between collection frequency and pause time.
⚠️ Common Mistakes
Assuming
sys.getsizeof()returns the total memory of a container. It only returns the shallow size of the object itself, not its contents. A list'sgetsizeofreturns the size of the list structure (pointers), not the objects it contains. For deep size, recursively walk referents or usepympler.asizeof().Using
__slots__in classes that need dynamic attributes or multiple inheritance with conflicting slots. Slots prevent adding arbitrary attributes and can cause issues with complex inheritance hierarchies. Only use__slots__for data-heavy classes where you create many instances and the attribute set is known and fixed.Relying on
__del__(finalizers) for resource cleanup.__del__has unpredictable timing, can cause issues with reference cycles (pre-3.4), and is not guaranteed to run at all during interpreter shutdown. Use context managers (withstatement) andtry/finallyfor deterministic cleanup instead.Ignoring memory leaks in long-running services. Patterns like appending to global lists, unbounded caches, circular references in callback registrations, and closures capturing request-scoped data all cause gradual memory growth. Monitor RSS over time and use
tracemallocsnapshot comparison to find the source.Calling
gc.disable()without understanding the consequences. Disabling the GC eliminates pause times but means reference cycles will never be freed, causing memory leaks. Only disable GC temporarily during benchmarks or latency-critical sections, and re-enable immediately after.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Memory Management