File Operations & Context Managers
📖 Concept
Python provides powerful, cross-platform file I/O through the built-in open() function and the pathlib module. The context manager protocol (with statement) ensures files are always properly closed, even if exceptions occur — a pattern so fundamental it's used throughout the standard library for any resource that needs deterministic cleanup.
open() modes:
| Mode | Description | Creates? | Truncates? |
|---|---|---|---|
"r" |
Read text (default) | No | No |
"w" |
Write text | Yes | Yes |
"a" |
Append text | Yes | No |
"x" |
Exclusive create | Yes (fail if exists) | No |
"rb" / "wb" |
Read/write binary | — | — |
"r+" / "w+" |
Read+write | No/Yes | No/Yes |
pathlib.Path is the modern, object-oriented approach to filesystem paths. It replaces os.path with chainable methods, operator overloading (/), and built-in read/write helpers. Always prefer pathlib over os.path in new code.
Context managers implement the __enter__/__exit__ protocol. The with statement guarantees __exit__ runs, providing deterministic resource cleanup. You can write custom context managers as classes or with @contextmanager from contextlib.
Encoding matters: Always specify encoding="utf-8" explicitly. The default encoding is platform-dependent (locale.getpreferredencoding()), which causes cross-platform bugs. Python 3.15 will make UTF-8 the default, but be explicit until then.
💻 Code Example
1# ============================================================2# Basic file operations with context managers3# ============================================================4# GOOD — always use 'with' for automatic cleanup5with open("output.txt", "w", encoding="utf-8") as f:6 f.write("Line 1\n")7 f.write("Line 2\n")8 # File is automatically closed when exiting the 'with' block9 # Even if an exception occurs inside1011# BAD — manual close is error-prone12# f = open("output.txt", "w")13# f.write("data")14# f.close() # What if write() raises? File stays open!1516# Reading entire file17with open("output.txt", "r", encoding="utf-8") as f:18 content = f.read() # Entire file as one string1920# Reading line by line (memory efficient for large files)21with open("output.txt", "r", encoding="utf-8") as f:22 for line in f: # File object is iterable23 print(line.rstrip()) # rstrip() removes trailing newline2425# Reading all lines into a list26with open("output.txt", "r", encoding="utf-8") as f:27 lines = f.readlines() # List of strings (includes \n)2829# Exclusive create — fail if file already exists30# Prevents accidentally overwriting important files31try:32 with open("new_file.txt", "x", encoding="utf-8") as f:33 f.write("Created safely")34except FileExistsError:35 print("File already exists — won't overwrite")363738# ============================================================39# pathlib — the modern way to handle paths40# ============================================================41from pathlib import Path4243# Creating paths — use / operator for joining44project_root = Path("/home/user/project")45config_path = project_root / "config" / "settings.json"46print(config_path) # /home/user/project/config/settings.json4748# Path properties49p = Path("/home/user/project/data/report.csv.gz")50print(p.name) # "report.csv.gz"51print(p.stem) # "report.csv" (name without LAST suffix)52print(p.suffix) # ".gz"53print(p.suffixes) # [".csv", ".gz"]54print(p.parent) # /home/user/project/data55print(p.parts) # ('/', 'home', 'user', 'project', 'data', 'report.csv.gz')5657# Current directory and home58cwd = Path.cwd()59home = Path.home()6061# Check existence and type62p = Path("some_path")63p.exists() # True/False64p.is_file() # True if it's a regular file65p.is_dir() # True if it's a directory66p.is_symlink() # True if it's a symbolic link6768# Read/write helpers (open + read/write + close in one call)69config_path = Path("config.txt")70config_path.write_text("key=value\n", encoding="utf-8")71content = config_path.read_text(encoding="utf-8")7273# Binary read/write74data_path = Path("data.bin")75data_path.write_bytes(b"\x00\x01\x02\x03")76raw = data_path.read_bytes()7778# Directory operations79output_dir = Path("output/reports/2024")80output_dir.mkdir(parents=True, exist_ok=True) # Like mkdir -p8182# Glob — find files matching patterns83project = Path(".")84py_files = list(project.glob("*.py")) # Current dir only85all_py = list(project.rglob("*.py")) # Recursive86csvs = list(project.glob("data/**/*.csv")) # Specific subdir8788# Iterating directory contents89for item in Path(".").iterdir():90 if item.is_file():91 print(f" File: {item.name} ({item.stat().st_size} bytes)")929394# ============================================================95# Production pattern: safe file writing with atomic replace96# ============================================================97import tempfile98import os99100def atomic_write(filepath: str, content: str, encoding="utf-8"):101 """Write to a file atomically — prevents partial writes on crash.102103 Writes to a temp file first, then renames (which is atomic on POSIX).104 If the process crashes mid-write, the original file is untouched.105 """106 path = Path(filepath)107 path.parent.mkdir(parents=True, exist_ok=True)108109 # Write to temp file in the same directory (same filesystem for rename)110 fd, tmp_path = tempfile.mkstemp(111 dir=path.parent, suffix=".tmp", prefix=f".{path.name}."112 )113 try:114 with os.fdopen(fd, "w", encoding=encoding) as tmp_file:115 tmp_file.write(content)116 tmp_file.flush()117 os.fsync(tmp_file.fileno()) # Force write to disk118 os.replace(tmp_path, filepath) # Atomic rename119 except BaseException:120 os.unlink(tmp_path) # Clean up temp file on failure121 raise122123atomic_write("important_data.json", '{"status": "saved"}')124125126# ============================================================127# Custom context managers — class-based128# ============================================================129class ManagedFile:130 """Context manager for file operations with logging."""131132 def __init__(self, filename, mode="r", encoding="utf-8"):133 self.filename = filename134 self.mode = mode135 self.encoding = encoding136 self.file = None137138 def __enter__(self):139 print(f"Opening {self.filename} in mode '{self.mode}'")140 self.file = open(self.filename, self.mode, encoding=self.encoding)141 return self.file142143 def __exit__(self, exc_type, exc_val, exc_tb):144 if self.file:145 self.file.close()146 print(f"Closed {self.filename}")147 if exc_type is not None:148 print(f"Exception occurred: {exc_type.__name__}: {exc_val}")149 return False # Don't suppress exceptions150151152# ============================================================153# Custom context managers — generator-based (simpler)154# ============================================================155from contextlib import contextmanager156157@contextmanager158def temp_directory():159 """Create a temporary directory, clean up when done."""160 import tempfile161 import shutil162163 tmpdir = tempfile.mkdtemp()164 try:165 yield Path(tmpdir) # Value given to 'as' variable166 finally:167 shutil.rmtree(tmpdir) # Cleanup always runs168169with temp_directory() as tmpdir:170 data_file = tmpdir / "data.txt"171 data_file.write_text("temporary data", encoding="utf-8")172 print(f"Working in: {tmpdir}")173# tmpdir is deleted here174175176# ============================================================177# Processing large files efficiently178# ============================================================179def count_lines_in_large_file(filepath: str) -> int:180 """Count lines without loading entire file into memory."""181 count = 0182 with open(filepath, "r", encoding="utf-8") as f:183 for _ in f: # Iterating line-by-line uses minimal memory184 count += 1185 return count186187def process_in_chunks(filepath: str, chunk_size: int = 8192):188 """Read binary file in fixed-size chunks for processing."""189 with open(filepath, "rb") as f:190 while True:191 chunk = f.read(chunk_size)192 if not chunk:193 break194 # Process chunk (e.g., compute hash, upload, etc.)195 yield chunk
🏋️ Practice Exercise
Exercises:
Write a function that reads a text file, counts word frequencies, and writes the results to a new file sorted by frequency (descending). Use
pathlibfor all path operations and handleFileNotFoundErrorandPermissionErrorgracefully.Implement an
atomic_writecontext manager that writes to a temp file and atomically renames on success. If an exception occurs inside thewithblock, delete the temp file and leave the original untouched.Create a
@contextmanagerfunction calledlocked_file(path)that acquires a file lock (usingfcntl.flockon Unix) before writing and releases it infinally. Test with two concurrent writers.Write a script that recursively finds all
.logfiles in a directory tree usingPath.rglob(), reads each one, extracts lines containing "ERROR", and writes them to a consolidatederrors.txtwith the source filename prepended to each line.Build a
FileWatcherclass that monitors a file for changes by pollingstat().st_mtime. Use it as a context manager that starts/stops the polling. Demonstrate it detecting an external file modification.Process a 1GB+ CSV file line-by-line (without loading it all into memory). Count rows matching a condition and report progress every 100,000 rows.
⚠️ Common Mistakes
Not specifying
encoding='utf-8'when opening text files. The default is platform-dependent (locale.getpreferredencoding()), causingUnicodeDecodeErroror silent data corruption when code runs on different OS. Always pass encoding explicitly.Using
'w'mode when you meant'a'(append).'w'truncates the file to zero length before writing. If you need to add to an existing file, use'a'. If you want to fail on existing files, use'x'(exclusive create).Building paths with string concatenation (
dir + '/' + filename) instead ofpathlib.Pathoros.path.join(). String concatenation breaks on Windows (which uses backslashes) and mishandles edge cases like double slashes.Reading entire large files into memory with
.read()or.readlines(). For files larger than available RAM, iterate line-by-line (for line in f:) or use chunked reading (f.read(chunk_size)).Forgetting that
__exit__must returnTrueto suppress an exception. ReturningNoneorFalse(the default) lets the exception propagate. Only returnTruewhen you intentionally want to swallow the exception — which is rarely correct.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for File Operations & Context Managers