Memory Management

0/3 in this phase0/54 across the roadmap

📖 Concept

Understanding Python's memory management is essential for writing efficient programs, especially when dealing with large datasets, long-running services, or memory-constrained environments. CPython uses a combination of reference counting and a cyclic garbage collector to manage memory automatically.

Reference counting is the primary mechanism. Every Python object has a reference count — an integer tracking how many variables, containers, or attributes point to it. When an object's reference count drops to zero, CPython immediately deallocates it. You can inspect reference counts with sys.getrefcount() (note: calling the function itself adds a temporary reference).

Cyclic garbage collector handles reference cycles — situations where objects reference each other, keeping their reference counts above zero even when no external references exist. The GC uses a generational algorithm with three generations (0, 1, 2). New objects start in generation 0. Objects that survive a collection are promoted to the next generation. Higher generations are collected less frequently, based on the observation that long-lived objects tend to stay alive.

__slots__ is a class-level optimization that replaces the per-instance __dict__ (a hash table) with a fixed-size array of slot descriptors. This reduces memory per instance by 40-60% for classes with few attributes and also speeds up attribute access. The tradeoff: you cannot dynamically add attributes not listed in __slots__.

weakref provides references to objects that do not increase the reference count. When the referent is garbage-collected, the weak reference returns None (or calls a callback). This is critical for caches, observer patterns, and parent-child relationships where you want to avoid preventing garbage collection.

Common memory leak patterns:

  • Circular references with __del__ methods (pre-Python 3.4 could not collect these)
  • Unbounded caches — dictionaries or lists that grow without limit
  • Global-scope accumulation — appending to module-level lists in loops or requests
  • Closures capturing large objects — inner functions retaining references to large enclosing-scope variables
  • C extension objects — objects allocated by C libraries that Python's GC cannot track

Measuring object size: sys.getsizeof() returns the memory consumed by a single object (not counting nested objects). For deep size measurement, use pympler.asizeof() or manually walk the object graph with gc.get_referents().

💻 Code Example

codeTap to expand ⛶
1# ============================================================
2# 1. Reference counting basics
3# ============================================================
4import sys
5import gc
6
7
8def reference_counting_demo():
9 """Demonstrate how reference counting works in CPython."""
10 a = [1, 2, 3] # refcount = 1 (variable 'a')
11 print(f"After creation : refcount = {sys.getrefcount(a) - 1}")
12 # Note: getrefcount adds 1 for the function argument itself
13
14 b = a # refcount = 2 (a and b point to same list)
15 print(f"After b = a : refcount = {sys.getrefcount(a) - 1}")
16
17 c = [a, a, a] # refcount = 5 (a, b, and 3 slots in c)
18 print(f"After c = [a,a,a] : refcount = {sys.getrefcount(a) - 1}")
19
20 del c # refcount = 2 (removed 3 references from c)
21 print(f"After del c : refcount = {sys.getrefcount(a) - 1}")
22
23 del b # refcount = 1
24 print(f"After del b : refcount = {sys.getrefcount(a) - 1}")
25 # When refcount hits 0, memory is freed immediately
26
27
28# ============================================================
29# 2. Garbage collector — handling reference cycles
30# ============================================================
31def gc_demo():
32 """Demonstrate cyclic garbage collection."""
33 gc.collect() # Clear any pending garbage
34 gc.set_debug(gc.DEBUG_STATS)
35
36 # Create a reference cycle
37 class Node:
38 def __init__(self, name):
39 self.name = name
40 self.ref = None
41
42 a = Node("A")
43 b = Node("B")
44 a.ref = b # A -> B
45 b.ref = a # B -> A (cycle!)
46
47 # Remove external references
48 del a
49 del b
50 # Refcounts are still 1 (due to the cycle), so reference
51 # counting alone cannot free them
52
53 # Force garbage collection — the cyclic GC detects and frees them
54 collected = gc.collect()
55 print(f"\nGC collected {collected} objects (including the cycle)")
56
57 # Inspect GC generations
58 print(f"GC thresholds: {gc.get_threshold()}")
59 print(f"GC counts : {gc.get_count()}")
60
61 gc.set_debug(0) # Reset debug
62
63
64# ============================================================
65# 3. __slots__ — Memory-efficient classes
66# ============================================================
67class PointRegular:
68 """Regular class — uses __dict__ for attribute storage."""
69 def __init__(self, x, y, z):
70 self.x = x
71 self.y = y
72 self.z = z
73
74
75class PointSlots:
76 """Slotted class — fixed attribute storage, no __dict__."""
77 __slots__ = ("x", "y", "z")
78
79 def __init__(self, x, y, z):
80 self.x = x
81 self.y = y
82 self.z = z
83
84
85def slots_comparison():
86 """Compare memory usage of regular vs slotted classes."""
87 regular = PointRegular(1.0, 2.0, 3.0)
88 slotted = PointSlots(1.0, 2.0, 3.0)
89
90 regular_size = sys.getsizeof(regular) + sys.getsizeof(regular.__dict__)
91 slotted_size = sys.getsizeof(slotted)
92
93 print(f"\nRegular class size: {regular_size} bytes")
94 print(f"Slotted class size: {slotted_size} bytes")
95 print(f"Savings : {regular_size - slotted_size} bytes "
96 f"({(1 - slotted_size/regular_size)*100:.0f}%)")
97
98 # Scale test: create 1 million instances
99 import tracemalloc
100 tracemalloc.start()
101
102 regular_list = [PointRegular(i, i+1, i+2) for i in range(100_000)]
103 regular_mem = tracemalloc.get_traced_memory()[0]
104
105 tracemalloc.stop()
106 tracemalloc.start()
107
108 slotted_list = [PointSlots(i, i+1, i+2) for i in range(100_000)]
109 slotted_mem = tracemalloc.get_traced_memory()[0]
110
111 tracemalloc.stop()
112
113 print(f"\n100K regular instances: {regular_mem / 1024 / 1024:.1f} MB")
114 print(f"100K slotted instances: {slotted_mem / 1024 / 1024:.1f} MB")
115
116 # Clean up
117 del regular_list, slotted_list
118
119
120# ============================================================
121# 4. weakref — Non-preventing references
122# ============================================================
123import weakref
124
125
126class ExpensiveResource:
127 """Simulates a large cached object."""
128 def __init__(self, name, data_size=1_000_000):
129 self.name = name
130 self.data = bytearray(data_size) # ~1MB
131
132 def __repr__(self):
133 return f"ExpensiveResource({self.name!r})"
134
135 def __del__(self):
136 print(f" [GC] {self.name} deallocated")
137
138
139def weakref_demo():
140 """Demonstrate weak references for caching."""
141 print("\n--- weakref demo ---")
142
143 # Strong reference keeps object alive
144 resource = ExpensiveResource("primary")
145 weak = weakref.ref(resource)
146
147 print(f"Weak ref alive: {weak() is not None}") # True
148 print(f"Object: {weak()}")
149
150 # Delete strong reference — object is freed
151 del resource
152 print(f"After del: weak ref alive: {weak() is not None}") # False
153
154
155class WeakCache:
156 """Cache that does not prevent garbage collection."""
157
158 def __init__(self):
159 self._cache = weakref.WeakValueDictionary()
160 self._stats = {"hits": 0, "misses": 0}
161
162 def get_or_create(self, key, factory):
163 """Retrieve from cache or create with factory function."""
164 obj = self._cache.get(key)
165 if obj is not None:
166 self._stats["hits"] += 1
167 return obj
168
169 self._stats["misses"] += 1
170 obj = factory(key)
171 self._cache[key] = obj
172 return obj
173
174 @property
175 def stats(self):
176 return {**self._stats, "cached": len(self._cache)}
177
178
179def weak_cache_demo():
180 """Demonstrate WeakValueDictionary cache."""
181 print("\n--- WeakCache demo ---")
182 cache = WeakCache()
183
184 # Create and cache resources
185 refs = []
186 for name in ["alpha", "beta", "gamma"]:
187 resource = cache.get_or_create(name, ExpensiveResource)
188 refs.append(resource)
189
190 print(f"Cache stats: {cache.stats}")
191
192 # Access cached items (hits)
193 for name in ["alpha", "beta"]:
194 cache.get_or_create(name, ExpensiveResource)
195
196 print(f"After re-access: {cache.stats}")
197
198 # Release some strong references — objects may be GC'd
199 del refs[0] # Release "alpha"
200 gc.collect()
201 print(f"After releasing alpha: {cache.stats}")
202
203
204# ============================================================
205# 5. Memory leak detection with tracemalloc
206# ============================================================
207import tracemalloc
208
209
210def detect_memory_leak():
211 """Use tracemalloc to find memory leaks."""
212 tracemalloc.start()
213 snapshot1 = tracemalloc.take_snapshot()
214
215 # Simulate a memory leak: data accumulates in a global list
216 leaked_data = []
217 for i in range(1000):
218 leaked_data.append(" " * 10_000) # 10KB strings
219
220 snapshot2 = tracemalloc.take_snapshot()
221
222 # Compare snapshots to find where memory grew
223 stats = snapshot2.compare_to(snapshot1, "lineno")
224
225 print("\nTop 5 memory increases:")
226 for stat in stats[:5]:
227 print(f" {stat}")
228
229 current, peak = tracemalloc.get_traced_memory()
230 print(f"\nCurrent: {current / 1024 / 1024:.2f} MB")
231 print(f"Peak : {peak / 1024 / 1024:.2f} MB")
232
233 tracemalloc.stop()
234
235
236# ============================================================
237# 6. Object size inspection
238# ============================================================
239def inspect_object_sizes():
240 """Show memory consumed by different Python objects."""
241 objects = {
242 "int (0)": 0,
243 "int (1)": 1,
244 "int (2**30)": 2**30,
245 "float": 3.14,
246 "str (empty)": "",
247 "str (10 chars)": "0123456789",
248 "bytes (empty)": b"",
249 "bytes (10)": b"0123456789",
250 "list (empty)": [],
251 "list (10 ints)": list(range(10)),
252 "dict (empty)": {},
253 "dict (10 items)": {i: i for i in range(10)},
254 "set (empty)": set(),
255 "set (10 items)": set(range(10)),
256 "tuple (empty)": (),
257 "tuple (10)": tuple(range(10)),
258 }
259
260 print("\nObject sizes (sys.getsizeof):")
261 print(f" {'Type':<20} {'Size (bytes)':>12}")
262 print(f" {'-'*20} {'-'*12}")
263 for label, obj in objects.items():
264 size = sys.getsizeof(obj)
265 print(f" {label:<20} {size:>12}")
266
267
268# ============================================================
269# Usage
270# ============================================================
271if __name__ == "__main__":
272 reference_counting_demo()
273 gc_demo()
274 slots_comparison()
275 weakref_demo()
276 weak_cache_demo()
277 detect_memory_leak()
278 inspect_object_sizes()

🏋️ Practice Exercise

Exercises:

  1. Create a class with and without __slots__. Instantiate 1 million objects of each and measure memory usage with tracemalloc. Calculate the per-instance memory savings. Try adding a dynamic attribute to the slotted class and observe the AttributeError.

  2. Build a reference cycle detector: write a function that creates objects with circular references, then use gc.get_referrers() and gc.get_referents() to trace the cycle. Force collection with gc.collect() and verify the objects were freed.

  3. Implement a WeakCache class using weakref.WeakValueDictionary. Create cached objects, hold strong references to some, delete others, and verify that the cache automatically shrinks. Add hit/miss statistics.

  4. Use tracemalloc snapshot comparison to find a simulated memory leak: write a function that appends to a module-level list on each call. Take snapshots before and after 1000 calls, compare them, and identify the leaking line.

  5. Write a function that measures the deep size of a nested data structure (dict of lists of dicts) by recursively walking gc.get_referents() and summing sys.getsizeof(). Compare your result with pympler.asizeof() if available.

  6. Experiment with gc.set_threshold() to change garbage collection frequency. Create a program that generates many short-lived reference cycles. Measure performance with different threshold settings and explain the tradeoff between collection frequency and pause time.

⚠️ Common Mistakes

  • Assuming sys.getsizeof() returns the total memory of a container. It only returns the shallow size of the object itself, not its contents. A list's getsizeof returns the size of the list structure (pointers), not the objects it contains. For deep size, recursively walk referents or use pympler.asizeof().

  • Using __slots__ in classes that need dynamic attributes or multiple inheritance with conflicting slots. Slots prevent adding arbitrary attributes and can cause issues with complex inheritance hierarchies. Only use __slots__ for data-heavy classes where you create many instances and the attribute set is known and fixed.

  • Relying on __del__ (finalizers) for resource cleanup. __del__ has unpredictable timing, can cause issues with reference cycles (pre-3.4), and is not guaranteed to run at all during interpreter shutdown. Use context managers (with statement) and try/finally for deterministic cleanup instead.

  • Ignoring memory leaks in long-running services. Patterns like appending to global lists, unbounded caches, circular references in callback registrations, and closures capturing request-scoped data all cause gradual memory growth. Monitor RSS over time and use tracemalloc snapshot comparison to find the source.

  • Calling gc.disable() without understanding the consequences. Disabling the GC eliminates pause times but means reference cycles will never be freed, causing memory leaks. Only disable GC temporarily during benchmarks or latency-critical sections, and re-enable immediately after.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Memory Management