File Operations & Context Managers

0/4 in this phase0/54 across the roadmap

📖 Concept

Python provides powerful, cross-platform file I/O through the built-in open() function and the pathlib module. The context manager protocol (with statement) ensures files are always properly closed, even if exceptions occur — a pattern so fundamental it's used throughout the standard library for any resource that needs deterministic cleanup.

open() modes:

Mode	Description	Creates?	Truncates?
`"r"`	Read text (default)	No	No
`"w"`	Write text	Yes	Yes
`"a"`	Append text	Yes	No
`"x"`	Exclusive create	Yes (fail if exists)	No
`"rb"` / `"wb"`	Read/write binary	—	—
`"r+"` / `"w+"`	Read+write	No/Yes	No/Yes

pathlib.Path is the modern, object-oriented approach to filesystem paths. It replaces os.path with chainable methods, operator overloading (/), and built-in read/write helpers. Always prefer pathlib over os.path in new code.

Context managers implement the __enter__/__exit__ protocol. The with statement guarantees __exit__ runs, providing deterministic resource cleanup. You can write custom context managers as classes or with @contextmanager from contextlib.

Encoding matters: Always specify encoding="utf-8" explicitly. The default encoding is platform-dependent (locale.getpreferredencoding()), which causes cross-platform bugs. Python 3.15 will make UTF-8 the default, but be explicit until then.

💻 Code Example

codeTap to expand ⛶

1# ============================================================
2# Basic file operations with context managers
3# ============================================================
4# GOOD — always use 'with' for automatic cleanup
5with open("output.txt", "w", encoding="utf-8") as f:
6    f.write("Line 1\n")
7    f.write("Line 2\n")
8    # File is automatically closed when exiting the 'with' block
9    # Even if an exception occurs inside
10
11# BAD — manual close is error-prone
12# f = open("output.txt", "w")
13# f.write("data")
14# f.close()  # What if write() raises? File stays open!
15
16# Reading entire file
17with open("output.txt", "r", encoding="utf-8") as f:
18    content = f.read()           # Entire file as one string
19
20# Reading line by line (memory efficient for large files)
21with open("output.txt", "r", encoding="utf-8") as f:
22    for line in f:               # File object is iterable
23        print(line.rstrip())     # rstrip() removes trailing newline
24
25# Reading all lines into a list
26with open("output.txt", "r", encoding="utf-8") as f:
27    lines = f.readlines()        # List of strings (includes \n)
28
29# Exclusive create — fail if file already exists
30# Prevents accidentally overwriting important files
31try:
32    with open("new_file.txt", "x", encoding="utf-8") as f:
33        f.write("Created safely")
34except FileExistsError:
35    print("File already exists — won't overwrite")
36
37
38# ============================================================
39# pathlib — the modern way to handle paths
40# ============================================================
41from pathlib import Path
42
43# Creating paths — use / operator for joining
44project_root = Path("/home/user/project")
45config_path = project_root / "config" / "settings.json"
46print(config_path)  # /home/user/project/config/settings.json
47
48# Path properties
49p = Path("/home/user/project/data/report.csv.gz")
50print(p.name)       # "report.csv.gz"
51print(p.stem)       # "report.csv"     (name without LAST suffix)
52print(p.suffix)     # ".gz"
53print(p.suffixes)   # [".csv", ".gz"]
54print(p.parent)     # /home/user/project/data
55print(p.parts)      # ('/', 'home', 'user', 'project', 'data', 'report.csv.gz')
56
57# Current directory and home
58cwd = Path.cwd()
59home = Path.home()
60
61# Check existence and type
62p = Path("some_path")
63p.exists()           # True/False
64p.is_file()          # True if it's a regular file
65p.is_dir()           # True if it's a directory
66p.is_symlink()       # True if it's a symbolic link
67
68# Read/write helpers (open + read/write + close in one call)
69config_path = Path("config.txt")
70config_path.write_text("key=value\n", encoding="utf-8")
71content = config_path.read_text(encoding="utf-8")
72
73# Binary read/write
74data_path = Path("data.bin")
75data_path.write_bytes(b"\x00\x01\x02\x03")
76raw = data_path.read_bytes()
77
78# Directory operations
79output_dir = Path("output/reports/2024")
80output_dir.mkdir(parents=True, exist_ok=True)  # Like mkdir -p
81
82# Glob — find files matching patterns
83project = Path(".")
84py_files = list(project.glob("*.py"))           # Current dir only
85all_py = list(project.rglob("*.py"))            # Recursive
86csvs = list(project.glob("data/**/*.csv"))      # Specific subdir
87
88# Iterating directory contents
89for item in Path(".").iterdir():
90    if item.is_file():
91        print(f"  File: {item.name} ({item.stat().st_size} bytes)")
92
93
94# ============================================================
95# Production pattern: safe file writing with atomic replace
96# ============================================================
97import tempfile
98import os
99
100def atomic_write(filepath: str, content: str, encoding="utf-8"):
101    """Write to a file atomically — prevents partial writes on crash.
102
103    Writes to a temp file first, then renames (which is atomic on POSIX).
104    If the process crashes mid-write, the original file is untouched.
105    """
106    path = Path(filepath)
107    path.parent.mkdir(parents=True, exist_ok=True)
108
109    # Write to temp file in the same directory (same filesystem for rename)
110    fd, tmp_path = tempfile.mkstemp(
111        dir=path.parent, suffix=".tmp", prefix=f".{path.name}."
112    )
113    try:
114        with os.fdopen(fd, "w", encoding=encoding) as tmp_file:
115            tmp_file.write(content)
116            tmp_file.flush()
117            os.fsync(tmp_file.fileno())  # Force write to disk
118        os.replace(tmp_path, filepath)   # Atomic rename
119    except BaseException:
120        os.unlink(tmp_path)              # Clean up temp file on failure
121        raise
122
123atomic_write("important_data.json", '{"status": "saved"}')
124
125
126# ============================================================
127# Custom context managers — class-based
128# ============================================================
129class ManagedFile:
130    """Context manager for file operations with logging."""
131
132    def __init__(self, filename, mode="r", encoding="utf-8"):
133        self.filename = filename
134        self.mode = mode
135        self.encoding = encoding
136        self.file = None
137
138    def __enter__(self):
139        print(f"Opening {self.filename} in mode '{self.mode}'")
140        self.file = open(self.filename, self.mode, encoding=self.encoding)
141        return self.file
142
143    def __exit__(self, exc_type, exc_val, exc_tb):
144        if self.file:
145            self.file.close()
146            print(f"Closed {self.filename}")
147        if exc_type is not None:
148            print(f"Exception occurred: {exc_type.__name__}: {exc_val}")
149        return False  # Don't suppress exceptions
150
151
152# ============================================================
153# Custom context managers — generator-based (simpler)
154# ============================================================
155from contextlib import contextmanager
156
157@contextmanager
158def temp_directory():
159    """Create a temporary directory, clean up when done."""
160    import tempfile
161    import shutil
162
163    tmpdir = tempfile.mkdtemp()
164    try:
165        yield Path(tmpdir)          # Value given to 'as' variable
166    finally:
167        shutil.rmtree(tmpdir)       # Cleanup always runs
168
169with temp_directory() as tmpdir:
170    data_file = tmpdir / "data.txt"
171    data_file.write_text("temporary data", encoding="utf-8")
172    print(f"Working in: {tmpdir}")
173# tmpdir is deleted here
174
175
176# ============================================================
177# Processing large files efficiently
178# ============================================================
179def count_lines_in_large_file(filepath: str) -> int:
180    """Count lines without loading entire file into memory."""
181    count = 0
182    with open(filepath, "r", encoding="utf-8") as f:
183        for _ in f:  # Iterating line-by-line uses minimal memory
184            count += 1
185    return count
186
187def process_in_chunks(filepath: str, chunk_size: int = 8192):
188    """Read binary file in fixed-size chunks for processing."""
189    with open(filepath, "rb") as f:
190        while True:
191            chunk = f.read(chunk_size)
192            if not chunk:
193                break
194            # Process chunk (e.g., compute hash, upload, etc.)
195            yield chunk

🏋️ Practice Exercise

Exercises:

Write a function that reads a text file, counts word frequencies, and writes the results to a new file sorted by frequency (descending). Use pathlib for all path operations and handle FileNotFoundError and PermissionError gracefully.
Implement an atomic_write context manager that writes to a temp file and atomically renames on success. If an exception occurs inside the with block, delete the temp file and leave the original untouched.
Create a @contextmanager function called locked_file(path) that acquires a file lock (using fcntl.flock on Unix) before writing and releases it in finally. Test with two concurrent writers.
Write a script that recursively finds all .log files in a directory tree using Path.rglob(), reads each one, extracts lines containing "ERROR", and writes them to a consolidated errors.txt with the source filename prepended to each line.
Build a FileWatcher class that monitors a file for changes by polling stat().st_mtime. Use it as a context manager that starts/stops the polling. Demonstrate it detecting an external file modification.
Process a 1GB+ CSV file line-by-line (without loading it all into memory). Count rows matching a condition and report progress every 100,000 rows.

⚠️ Common Mistakes

Not specifying encoding='utf-8' when opening text files. The default is platform-dependent (locale.getpreferredencoding()), causing UnicodeDecodeError or silent data corruption when code runs on different OS. Always pass encoding explicitly.
Using 'w' mode when you meant 'a' (append). 'w' truncates the file to zero length before writing. If you need to add to an existing file, use 'a'. If you want to fail on existing files, use 'x' (exclusive create).
Building paths with string concatenation (dir + '/' + filename) instead of pathlib.Path or os.path.join(). String concatenation breaks on Windows (which uses backslashes) and mishandles edge cases like double slashes.
Reading entire large files into memory with .read() or .readlines(). For files larger than available RAM, iterate line-by-line (for line in f:) or use chunked reading (f.read(chunk_size)).
Forgetting that __exit__ must return True to suppress an exception. Returning None or False (the default) lets the exception propagate. Only return True when you intentionally want to swallow the exception — which is rarely correct.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for File Operations & Context Managers

Was this topic helpful?

← PreviousExceptions: try/except/else/finally & Custom Exceptions Next →Serialization: JSON, CSV, Pickle, YAML & TOML