Databases & SQLAlchemy

0/4 in this phase0/54 across the roadmap

📖 Concept

SQLAlchemy is the most widely used database toolkit for Python. Version 2.0 introduced a modernized API with a cleaner, more Pythonic style that aligns with type hints and async patterns. Understanding SQLAlchemy is essential for any Python backend role.

SQLAlchemy Architecture:

Engine — manages the connection pool and dialect (PostgreSQL, MySQL, SQLite). Created once per application.
Session — a unit of work that tracks changes to objects and flushes them to the database in a transaction. Created per request.
Declarative Base — base class for ORM models that maps Python classes to database tables.
Mapped Columns — SQLAlchemy 2.0 uses `Mapped[type]` and `mapped_column()` for type-safe column definitions (replacing the old `Column()` style).

Core ORM Concepts:

Models define your schema as Python classes. Each class maps to a table, each attribute to a column.
Relationships link models together: `relationship()` defines the Python-side access pattern, `ForeignKey` defines the database constraint.
CRUD Operations use the Session: `session.add()` for insert, query with `select()`, modify attributes directly, `session.delete()` for removal, and `session.commit()` to persist.

Eager vs Lazy Loading:

Lazy loading (default) — related objects are loaded when you first access them. Causes the N+1 query problem: loading 100 users and accessing each user's posts triggers 101 queries.
Eager loading — loads related objects in the initial query. Three strategies: `joinedload()` (JOIN), `selectinload()` (separate SELECT IN), `subqueryload()` (subquery).

Alembic Migrations: Alembic is SQLAlchemy's migration tool (like Django's `makemigrations`/`migrate`). It generates migration scripts by comparing your models to the database state. Key commands: `alembic init` (setup), `alembic revision --autogenerate -m "message"` (create migration), `alembic upgrade head` (apply), `alembic downgrade -1` (rollback).

In interviews, focus on the N+1 problem, transaction management, and the difference between `flush()` (write to DB within transaction) and `commit()` (finalize transaction).

💻 Code Example

codeTap to expand ⛶

1# ============================================================
2# SQLAlchemy 2.0 ORM — Production Patterns
3# ============================================================
4# from sqlalchemy import (
5#     create_engine, ForeignKey, String, Text, DateTime, select, func
6# )
7# from sqlalchemy.orm import (
8#     DeclarativeBase, Mapped, mapped_column, relationship,
9#     Session, sessionmaker, joinedload, selectinload
10# )
11# from datetime import datetime, timezone
12# from typing import Optional
13#
14#
15# # ============================================================
16# # Engine & Base Setup
17# # ============================================================
18# engine = create_engine(
19#     "sqlite:///app.db",
20#     echo=False,          # Set True for SQL logging during development
21#     pool_size=5,         # Connection pool size (default 5)
22#     max_overflow=10,     # Extra connections beyond pool_size
23#     pool_recycle=3600,   # Recycle connections after 1 hour
24# )
25#
26# SessionLocal = sessionmaker(bind=engine, expire_on_commit=False)
27#
28#
29# class Base(DeclarativeBase):
30#     """SQLAlchemy 2.0 declarative base."""
31#     pass
32#
33#
34# # ============================================================
35# # Models — SQLAlchemy 2.0 Mapped style
36# # ============================================================
37# class User(Base):
38#     __tablename__ = "users"
39#
40#     # Mapped[type] + mapped_column() = type-safe columns
41#     id: Mapped[int] = mapped_column(primary_key=True)
42#     username: Mapped[str] = mapped_column(String(50), unique=True, index=True)
43#     email: Mapped[str] = mapped_column(String(255), unique=True)
44#     full_name: Mapped[Optional[str]] = mapped_column(String(100), nullable=True)
45#     created_at: Mapped[datetime] = mapped_column(
46#         DateTime(timezone=True),
47#         default=lambda: datetime.now(timezone.utc),
48#     )
49#
50#     # Relationship: one-to-many (User -> Posts)
51#     posts: Mapped[list["Post"]] = relationship(
52#         back_populates="author",
53#         cascade="all, delete-orphan",  # delete posts when user is deleted
54#         lazy="selectin",               # default eager loading strategy
55#     )
56#
57#     def __repr__(self) -> str:
58#         return f"User(id={self.id}, username={self.username!r})"
59#
60#
61# class Post(Base):
62#     __tablename__ = "posts"
63#
64#     id: Mapped[int] = mapped_column(primary_key=True)
65#     title: Mapped[str] = mapped_column(String(200))
66#     body: Mapped[str] = mapped_column(Text)
67#     published: Mapped[bool] = mapped_column(default=False)
68#     created_at: Mapped[datetime] = mapped_column(
69#         DateTime(timezone=True),
70#         default=lambda: datetime.now(timezone.utc),
71#     )
72#
73#     # Foreign key + relationship back-reference
74#     author_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
75#     author: Mapped["User"] = relationship(back_populates="posts")
76#
77#     # Many-to-many through association table
78#     tags: Mapped[list["Tag"]] = relationship(
79#         secondary="post_tags", back_populates="posts"
80#     )
81#
82#
83# class Tag(Base):
84#     __tablename__ = "tags"
85#
86#     id: Mapped[int] = mapped_column(primary_key=True)
87#     name: Mapped[str] = mapped_column(String(50), unique=True)
88#     posts: Mapped[list["Post"]] = relationship(
89#         secondary="post_tags", back_populates="tags"
90#     )
91#
92#
93# # Association table for many-to-many
94# from sqlalchemy import Table, Column, Integer
95# post_tags = Table(
96#     "post_tags",
97#     Base.metadata,
98#     Column("post_id", Integer, ForeignKey("posts.id"), primary_key=True),
99#     Column("tag_id", Integer, ForeignKey("tags.id"), primary_key=True),
100# )
101#
102#
103# # ============================================================
104# # CRUD Operations
105# # ============================================================
106#
107# # BAD: Not using a session context manager
108# # session = SessionLocal()
109# # session.add(user)
110# # session.commit()  # If this fails, session is left in broken state
111#
112# # GOOD: Session as context manager with automatic rollback
113# def create_user(username: str, email: str) -> User:
114#     with SessionLocal() as session:
115#         user = User(username=username, email=email)
116#         session.add(user)
117#         session.commit()
118#         session.refresh(user)  # reload from DB to get generated ID
119#         return user
120#
121#
122# def get_user_by_id(user_id: int) -> Optional[User]:
123#     with SessionLocal() as session:
124#         stmt = select(User).where(User.id == user_id)
125#         return session.execute(stmt).scalar_one_or_none()
126#
127#
128# # BAD: N+1 query problem
129# # def list_users_with_posts_bad():
130# #     with SessionLocal() as session:
131# #         users = session.execute(select(User)).scalars().all()
132# #         for user in users:
133# #             print(user.posts)  # Each access triggers a separate query!
134#
135# # GOOD: Eager loading to avoid N+1
136# def list_users_with_posts() -> list[User]:
137#     with SessionLocal() as session:
138#         stmt = (
139#             select(User)
140#             .options(joinedload(User.posts))  # JOIN in single query
141#             .order_by(User.created_at.desc())
142#         )
143#         return session.execute(stmt).unique().scalars().all()
144#
145#
146# def update_user_email(user_id: int, new_email: str) -> Optional[User]:
147#     with SessionLocal() as session:
148#         user = session.get(User, user_id)
149#         if user:
150#             user.email = new_email  # SQLAlchemy tracks the change
151#             session.commit()
152#             session.refresh(user)
153#         return user
154#
155#
156# def delete_user(user_id: int) -> bool:
157#     with SessionLocal() as session:
158#         user = session.get(User, user_id)
159#         if user:
160#             session.delete(user)  # cascade deletes posts too
161#             session.commit()
162#             return True
163#         return False
164#
165#
166# # ============================================================
167# # Advanced Queries
168# # ============================================================
169# def search_posts(keyword: str, limit: int = 20) -> list[Post]:
170#     with SessionLocal() as session:
171#         stmt = (
172#             select(Post)
173#             .options(joinedload(Post.author), selectinload(Post.tags))
174#             .where(Post.title.ilike(f"%{keyword}%"))
175#             .where(Post.published == True)
176#             .order_by(Post.created_at.desc())
177#             .limit(limit)
178#         )
179#         return session.execute(stmt).unique().scalars().all()
180#
181#
182# def get_user_post_counts() -> list[tuple]:
183#     """Aggregate query: count posts per user."""
184#     with SessionLocal() as session:
185#         stmt = (
186#             select(User.username, func.count(Post.id).label("post_count"))
187#             .join(Post, isouter=True)
188#             .group_by(User.username)
189#             .order_by(func.count(Post.id).desc())
190#         )
191#         return session.execute(stmt).all()
192#
193#
194# # ============================================================
195# # Alembic Migration Commands (run in terminal)
196# # ============================================================
197# # alembic init migrations              # initialize Alembic
198# # alembic revision --autogenerate -m "create users and posts"
199# # alembic upgrade head                 # apply all pending migrations
200# # alembic downgrade -1                 # rollback last migration
201# # alembic history                      # show migration history
202# # alembic current                      # show current revision

🏋️ Practice Exercise

Exercises:

Define SQLAlchemy 2.0 models for a blog system: `User`, `Post`, `Comment`, and `Tag` (many-to-many with Post). Use `Mapped[]` type annotations, `mapped_column()`, and proper relationships with cascade options. Create the tables with `Base.metadata.create_all(engine)`.
Write CRUD functions for the `Post` model: `create_post()`, `get_post()`, `update_post()`, `delete_post()`, and `list_posts(page, per_page)` with pagination. Use session context managers and handle the case where a post is not found.
Demonstrate the N+1 query problem: write a function that lists all users and their post titles using lazy loading. Enable SQL echo (`echo=True`) to see the query count. Then fix it using `joinedload()` and `selectinload()` and compare the query counts.
Set up Alembic for your project. Create an initial migration from your models, apply it, then add a `bio` column to the `User` model, generate a new migration, and apply it. Practice rolling back and re-applying migrations.
Write a function that performs a bulk insert of 10,000 records using `session.add_all()` and compare its performance to `session.execute(insert(Model).values(records))` using the core API. Time both approaches and explain the difference.
Implement a soft-delete pattern: add an `is_deleted` column and a `deleted_at` timestamp. Override the default query to exclude soft-deleted records using a custom `Session` class or event hooks.

⚠️ Common Mistakes

Not closing or properly scoping database sessions. A session that is never closed leaks connections from the pool. Always use with SessionLocal() as session: or FastAPI's Depends(get_db) with yield to ensure cleanup.
Ignoring the N+1 query problem. Loading 100 users and then accessing user.posts for each triggers 101 queries. Use joinedload() (single JOIN query) or selectinload() (two queries with IN clause) to eagerly load relationships.
Confusing session.flush() with session.commit(). flush() sends SQL to the database but stays inside the current transaction — a subsequent rollback undoes it. commit() finalizes the transaction permanently. Use flush() when you need generated IDs before committing.
Using the old SQLAlchemy 1.x Column() style instead of 2.0's Mapped[] + mapped_column(). The 2.0 style provides type safety, better IDE support, and aligns with Python's type hint ecosystem. Interviewers will notice if you use the legacy API.
Hardcoding database URLs instead of reading from environment variables. Use os.getenv('DATABASE_URL') or a settings management library like pydantic-settings. This prevents credentials from leaking into version control and allows different configs per environment.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Databases & SQLAlchemy

Was this topic helpful?

← PreviousREST APIs & HTTP Clients Next →NumPy: Numerical Computing