File Operations & Context Managers

0/4 in this phase0/54 across the roadmap

📖 Concept

Python provides powerful, cross-platform file I/O through the built-in open() function and the pathlib module. The context manager protocol (with statement) ensures files are always properly closed, even if exceptions occur — a pattern so fundamental it's used throughout the standard library for any resource that needs deterministic cleanup.

open() modes:

Mode Description Creates? Truncates?
"r" Read text (default) No No
"w" Write text Yes Yes
"a" Append text Yes No
"x" Exclusive create Yes (fail if exists) No
"rb" / "wb" Read/write binary
"r+" / "w+" Read+write No/Yes No/Yes

pathlib.Path is the modern, object-oriented approach to filesystem paths. It replaces os.path with chainable methods, operator overloading (/), and built-in read/write helpers. Always prefer pathlib over os.path in new code.

Context managers implement the __enter__/__exit__ protocol. The with statement guarantees __exit__ runs, providing deterministic resource cleanup. You can write custom context managers as classes or with @contextmanager from contextlib.

Encoding matters: Always specify encoding="utf-8" explicitly. The default encoding is platform-dependent (locale.getpreferredencoding()), which causes cross-platform bugs. Python 3.15 will make UTF-8 the default, but be explicit until then.

💻 Code Example

codeTap to expand ⛶
1# ============================================================
2# Basic file operations with context managers
3# ============================================================
4# GOOD — always use 'with' for automatic cleanup
5with open("output.txt", "w", encoding="utf-8") as f:
6 f.write("Line 1\n")
7 f.write("Line 2\n")
8 # File is automatically closed when exiting the 'with' block
9 # Even if an exception occurs inside
10
11# BAD — manual close is error-prone
12# f = open("output.txt", "w")
13# f.write("data")
14# f.close() # What if write() raises? File stays open!
15
16# Reading entire file
17with open("output.txt", "r", encoding="utf-8") as f:
18 content = f.read() # Entire file as one string
19
20# Reading line by line (memory efficient for large files)
21with open("output.txt", "r", encoding="utf-8") as f:
22 for line in f: # File object is iterable
23 print(line.rstrip()) # rstrip() removes trailing newline
24
25# Reading all lines into a list
26with open("output.txt", "r", encoding="utf-8") as f:
27 lines = f.readlines() # List of strings (includes \n)
28
29# Exclusive create — fail if file already exists
30# Prevents accidentally overwriting important files
31try:
32 with open("new_file.txt", "x", encoding="utf-8") as f:
33 f.write("Created safely")
34except FileExistsError:
35 print("File already exists — won't overwrite")
36
37
38# ============================================================
39# pathlib — the modern way to handle paths
40# ============================================================
41from pathlib import Path
42
43# Creating paths — use / operator for joining
44project_root = Path("/home/user/project")
45config_path = project_root / "config" / "settings.json"
46print(config_path) # /home/user/project/config/settings.json
47
48# Path properties
49p = Path("/home/user/project/data/report.csv.gz")
50print(p.name) # "report.csv.gz"
51print(p.stem) # "report.csv" (name without LAST suffix)
52print(p.suffix) # ".gz"
53print(p.suffixes) # [".csv", ".gz"]
54print(p.parent) # /home/user/project/data
55print(p.parts) # ('/', 'home', 'user', 'project', 'data', 'report.csv.gz')
56
57# Current directory and home
58cwd = Path.cwd()
59home = Path.home()
60
61# Check existence and type
62p = Path("some_path")
63p.exists() # True/False
64p.is_file() # True if it's a regular file
65p.is_dir() # True if it's a directory
66p.is_symlink() # True if it's a symbolic link
67
68# Read/write helpers (open + read/write + close in one call)
69config_path = Path("config.txt")
70config_path.write_text("key=value\n", encoding="utf-8")
71content = config_path.read_text(encoding="utf-8")
72
73# Binary read/write
74data_path = Path("data.bin")
75data_path.write_bytes(b"\x00\x01\x02\x03")
76raw = data_path.read_bytes()
77
78# Directory operations
79output_dir = Path("output/reports/2024")
80output_dir.mkdir(parents=True, exist_ok=True) # Like mkdir -p
81
82# Glob — find files matching patterns
83project = Path(".")
84py_files = list(project.glob("*.py")) # Current dir only
85all_py = list(project.rglob("*.py")) # Recursive
86csvs = list(project.glob("data/**/*.csv")) # Specific subdir
87
88# Iterating directory contents
89for item in Path(".").iterdir():
90 if item.is_file():
91 print(f" File: {item.name} ({item.stat().st_size} bytes)")
92
93
94# ============================================================
95# Production pattern: safe file writing with atomic replace
96# ============================================================
97import tempfile
98import os
99
100def atomic_write(filepath: str, content: str, encoding="utf-8"):
101 """Write to a file atomically — prevents partial writes on crash.
102
103 Writes to a temp file first, then renames (which is atomic on POSIX).
104 If the process crashes mid-write, the original file is untouched.
105 """
106 path = Path(filepath)
107 path.parent.mkdir(parents=True, exist_ok=True)
108
109 # Write to temp file in the same directory (same filesystem for rename)
110 fd, tmp_path = tempfile.mkstemp(
111 dir=path.parent, suffix=".tmp", prefix=f".{path.name}."
112 )
113 try:
114 with os.fdopen(fd, "w", encoding=encoding) as tmp_file:
115 tmp_file.write(content)
116 tmp_file.flush()
117 os.fsync(tmp_file.fileno()) # Force write to disk
118 os.replace(tmp_path, filepath) # Atomic rename
119 except BaseException:
120 os.unlink(tmp_path) # Clean up temp file on failure
121 raise
122
123atomic_write("important_data.json", '{"status": "saved"}')
124
125
126# ============================================================
127# Custom context managers — class-based
128# ============================================================
129class ManagedFile:
130 """Context manager for file operations with logging."""
131
132 def __init__(self, filename, mode="r", encoding="utf-8"):
133 self.filename = filename
134 self.mode = mode
135 self.encoding = encoding
136 self.file = None
137
138 def __enter__(self):
139 print(f"Opening {self.filename} in mode '{self.mode}'")
140 self.file = open(self.filename, self.mode, encoding=self.encoding)
141 return self.file
142
143 def __exit__(self, exc_type, exc_val, exc_tb):
144 if self.file:
145 self.file.close()
146 print(f"Closed {self.filename}")
147 if exc_type is not None:
148 print(f"Exception occurred: {exc_type.__name__}: {exc_val}")
149 return False # Don't suppress exceptions
150
151
152# ============================================================
153# Custom context managers — generator-based (simpler)
154# ============================================================
155from contextlib import contextmanager
156
157@contextmanager
158def temp_directory():
159 """Create a temporary directory, clean up when done."""
160 import tempfile
161 import shutil
162
163 tmpdir = tempfile.mkdtemp()
164 try:
165 yield Path(tmpdir) # Value given to 'as' variable
166 finally:
167 shutil.rmtree(tmpdir) # Cleanup always runs
168
169with temp_directory() as tmpdir:
170 data_file = tmpdir / "data.txt"
171 data_file.write_text("temporary data", encoding="utf-8")
172 print(f"Working in: {tmpdir}")
173# tmpdir is deleted here
174
175
176# ============================================================
177# Processing large files efficiently
178# ============================================================
179def count_lines_in_large_file(filepath: str) -> int:
180 """Count lines without loading entire file into memory."""
181 count = 0
182 with open(filepath, "r", encoding="utf-8") as f:
183 for _ in f: # Iterating line-by-line uses minimal memory
184 count += 1
185 return count
186
187def process_in_chunks(filepath: str, chunk_size: int = 8192):
188 """Read binary file in fixed-size chunks for processing."""
189 with open(filepath, "rb") as f:
190 while True:
191 chunk = f.read(chunk_size)
192 if not chunk:
193 break
194 # Process chunk (e.g., compute hash, upload, etc.)
195 yield chunk

🏋️ Practice Exercise

Exercises:

  1. Write a function that reads a text file, counts word frequencies, and writes the results to a new file sorted by frequency (descending). Use pathlib for all path operations and handle FileNotFoundError and PermissionError gracefully.

  2. Implement an atomic_write context manager that writes to a temp file and atomically renames on success. If an exception occurs inside the with block, delete the temp file and leave the original untouched.

  3. Create a @contextmanager function called locked_file(path) that acquires a file lock (using fcntl.flock on Unix) before writing and releases it in finally. Test with two concurrent writers.

  4. Write a script that recursively finds all .log files in a directory tree using Path.rglob(), reads each one, extracts lines containing "ERROR", and writes them to a consolidated errors.txt with the source filename prepended to each line.

  5. Build a FileWatcher class that monitors a file for changes by polling stat().st_mtime. Use it as a context manager that starts/stops the polling. Demonstrate it detecting an external file modification.

  6. Process a 1GB+ CSV file line-by-line (without loading it all into memory). Count rows matching a condition and report progress every 100,000 rows.

⚠️ Common Mistakes

  • Not specifying encoding='utf-8' when opening text files. The default is platform-dependent (locale.getpreferredencoding()), causing UnicodeDecodeError or silent data corruption when code runs on different OS. Always pass encoding explicitly.

  • Using 'w' mode when you meant 'a' (append). 'w' truncates the file to zero length before writing. If you need to add to an existing file, use 'a'. If you want to fail on existing files, use 'x' (exclusive create).

  • Building paths with string concatenation (dir + '/' + filename) instead of pathlib.Path or os.path.join(). String concatenation breaks on Windows (which uses backslashes) and mishandles edge cases like double slashes.

  • Reading entire large files into memory with .read() or .readlines(). For files larger than available RAM, iterate line-by-line (for line in f:) or use chunked reading (f.read(chunk_size)).

  • Forgetting that __exit__ must return True to suppress an exception. Returning None or False (the default) lets the exception propagate. Only return True when you intentionally want to swallow the exception — which is rarely correct.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for File Operations & Context Managers