Databases & SQLAlchemy

0/4 in this phase0/54 across the roadmap

📖 Concept

SQLAlchemy is the most widely used database toolkit for Python. Version 2.0 introduced a modernized API with a cleaner, more Pythonic style that aligns with type hints and async patterns. Understanding SQLAlchemy is essential for any Python backend role.

SQLAlchemy Architecture:

  • Engine — manages the connection pool and dialect (PostgreSQL, MySQL, SQLite). Created once per application.
  • Session — a unit of work that tracks changes to objects and flushes them to the database in a transaction. Created per request.
  • Declarative Base — base class for ORM models that maps Python classes to database tables.
  • Mapped Columns — SQLAlchemy 2.0 uses `Mapped[type]` and `mapped_column()` for type-safe column definitions (replacing the old `Column()` style).

Core ORM Concepts:

  • Models define your schema as Python classes. Each class maps to a table, each attribute to a column.
  • Relationships link models together: `relationship()` defines the Python-side access pattern, `ForeignKey` defines the database constraint.
  • CRUD Operations use the Session: `session.add()` for insert, query with `select()`, modify attributes directly, `session.delete()` for removal, and `session.commit()` to persist.

Eager vs Lazy Loading:

  • Lazy loading (default) — related objects are loaded when you first access them. Causes the N+1 query problem: loading 100 users and accessing each user's posts triggers 101 queries.
  • Eager loading — loads related objects in the initial query. Three strategies: `joinedload()` (JOIN), `selectinload()` (separate SELECT IN), `subqueryload()` (subquery).

Alembic Migrations: Alembic is SQLAlchemy's migration tool (like Django's `makemigrations`/`migrate`). It generates migration scripts by comparing your models to the database state. Key commands: `alembic init` (setup), `alembic revision --autogenerate -m "message"` (create migration), `alembic upgrade head` (apply), `alembic downgrade -1` (rollback).

In interviews, focus on the N+1 problem, transaction management, and the difference between `flush()` (write to DB within transaction) and `commit()` (finalize transaction).

💻 Code Example

codeTap to expand ⛶
1# ============================================================
2# SQLAlchemy 2.0 ORMProduction Patterns
3# ============================================================
4# from sqlalchemy import (
5# create_engine, ForeignKey, String, Text, DateTime, select, func
6# )
7# from sqlalchemy.orm import (
8# DeclarativeBase, Mapped, mapped_column, relationship,
9# Session, sessionmaker, joinedload, selectinload
10# )
11# from datetime import datetime, timezone
12# from typing import Optional
13#
14#
15# # ============================================================
16# # Engine & Base Setup
17# # ============================================================
18# engine = create_engine(
19# "sqlite:///app.db",
20# echo=False, # Set True for SQL logging during development
21# pool_size=5, # Connection pool size (default 5)
22# max_overflow=10, # Extra connections beyond pool_size
23# pool_recycle=3600, # Recycle connections after 1 hour
24# )
25#
26# SessionLocal = sessionmaker(bind=engine, expire_on_commit=False)
27#
28#
29# class Base(DeclarativeBase):
30# """SQLAlchemy 2.0 declarative base."""
31# pass
32#
33#
34# # ============================================================
35# # ModelsSQLAlchemy 2.0 Mapped style
36# # ============================================================
37# class User(Base):
38# __tablename__ = "users"
39#
40# # Mapped[type] + mapped_column() = type-safe columns
41# id: Mapped[int] = mapped_column(primary_key=True)
42# username: Mapped[str] = mapped_column(String(50), unique=True, index=True)
43# email: Mapped[str] = mapped_column(String(255), unique=True)
44# full_name: Mapped[Optional[str]] = mapped_column(String(100), nullable=True)
45# created_at: Mapped[datetime] = mapped_column(
46# DateTime(timezone=True),
47# default=lambda: datetime.now(timezone.utc),
48# )
49#
50# # Relationship: one-to-many (User -> Posts)
51# posts: Mapped[list["Post"]] = relationship(
52# back_populates="author",
53# cascade="all, delete-orphan", # delete posts when user is deleted
54# lazy="selectin", # default eager loading strategy
55# )
56#
57# def __repr__(self) -> str:
58# return f"User(id={self.id}, username={self.username!r})"
59#
60#
61# class Post(Base):
62# __tablename__ = "posts"
63#
64# id: Mapped[int] = mapped_column(primary_key=True)
65# title: Mapped[str] = mapped_column(String(200))
66# body: Mapped[str] = mapped_column(Text)
67# published: Mapped[bool] = mapped_column(default=False)
68# created_at: Mapped[datetime] = mapped_column(
69# DateTime(timezone=True),
70# default=lambda: datetime.now(timezone.utc),
71# )
72#
73# # Foreign key + relationship back-reference
74# author_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
75# author: Mapped["User"] = relationship(back_populates="posts")
76#
77# # Many-to-many through association table
78# tags: Mapped[list["Tag"]] = relationship(
79# secondary="post_tags", back_populates="posts"
80# )
81#
82#
83# class Tag(Base):
84# __tablename__ = "tags"
85#
86# id: Mapped[int] = mapped_column(primary_key=True)
87# name: Mapped[str] = mapped_column(String(50), unique=True)
88# posts: Mapped[list["Post"]] = relationship(
89# secondary="post_tags", back_populates="tags"
90# )
91#
92#
93# # Association table for many-to-many
94# from sqlalchemy import Table, Column, Integer
95# post_tags = Table(
96# "post_tags",
97# Base.metadata,
98# Column("post_id", Integer, ForeignKey("posts.id"), primary_key=True),
99# Column("tag_id", Integer, ForeignKey("tags.id"), primary_key=True),
100# )
101#
102#
103# # ============================================================
104# # CRUD Operations
105# # ============================================================
106#
107# # BAD: Not using a session context manager
108# # session = SessionLocal()
109# # session.add(user)
110# # session.commit() # If this fails, session is left in broken state
111#
112# # GOOD: Session as context manager with automatic rollback
113# def create_user(username: str, email: str) -> User:
114# with SessionLocal() as session:
115# user = User(username=username, email=email)
116# session.add(user)
117# session.commit()
118# session.refresh(user) # reload from DB to get generated ID
119# return user
120#
121#
122# def get_user_by_id(user_id: int) -> Optional[User]:
123# with SessionLocal() as session:
124# stmt = select(User).where(User.id == user_id)
125# return session.execute(stmt).scalar_one_or_none()
126#
127#
128# # BAD: N+1 query problem
129# # def list_users_with_posts_bad():
130# # with SessionLocal() as session:
131# # users = session.execute(select(User)).scalars().all()
132# # for user in users:
133# # print(user.posts) # Each access triggers a separate query!
134#
135# # GOOD: Eager loading to avoid N+1
136# def list_users_with_posts() -> list[User]:
137# with SessionLocal() as session:
138# stmt = (
139# select(User)
140# .options(joinedload(User.posts)) # JOIN in single query
141# .order_by(User.created_at.desc())
142# )
143# return session.execute(stmt).unique().scalars().all()
144#
145#
146# def update_user_email(user_id: int, new_email: str) -> Optional[User]:
147# with SessionLocal() as session:
148# user = session.get(User, user_id)
149# if user:
150# user.email = new_email # SQLAlchemy tracks the change
151# session.commit()
152# session.refresh(user)
153# return user
154#
155#
156# def delete_user(user_id: int) -> bool:
157# with SessionLocal() as session:
158# user = session.get(User, user_id)
159# if user:
160# session.delete(user) # cascade deletes posts too
161# session.commit()
162# return True
163# return False
164#
165#
166# # ============================================================
167# # Advanced Queries
168# # ============================================================
169# def search_posts(keyword: str, limit: int = 20) -> list[Post]:
170# with SessionLocal() as session:
171# stmt = (
172# select(Post)
173# .options(joinedload(Post.author), selectinload(Post.tags))
174# .where(Post.title.ilike(f"%{keyword}%"))
175# .where(Post.published == True)
176# .order_by(Post.created_at.desc())
177# .limit(limit)
178# )
179# return session.execute(stmt).unique().scalars().all()
180#
181#
182# def get_user_post_counts() -> list[tuple]:
183# """Aggregate query: count posts per user."""
184# with SessionLocal() as session:
185# stmt = (
186# select(User.username, func.count(Post.id).label("post_count"))
187# .join(Post, isouter=True)
188# .group_by(User.username)
189# .order_by(func.count(Post.id).desc())
190# )
191# return session.execute(stmt).all()
192#
193#
194# # ============================================================
195# # Alembic Migration Commands (run in terminal)
196# # ============================================================
197# # alembic init migrations # initialize Alembic
198# # alembic revision --autogenerate -m "create users and posts"
199# # alembic upgrade head # apply all pending migrations
200# # alembic downgrade -1 # rollback last migration
201# # alembic history # show migration history
202# # alembic current # show current revision

🏋️ Practice Exercise

Exercises:

  1. Define SQLAlchemy 2.0 models for a blog system: `User`, `Post`, `Comment`, and `Tag` (many-to-many with Post). Use `Mapped[]` type annotations, `mapped_column()`, and proper relationships with cascade options. Create the tables with `Base.metadata.create_all(engine)`.

  2. Write CRUD functions for the `Post` model: `create_post()`, `get_post()`, `update_post()`, `delete_post()`, and `list_posts(page, per_page)` with pagination. Use session context managers and handle the case where a post is not found.

  3. Demonstrate the N+1 query problem: write a function that lists all users and their post titles using lazy loading. Enable SQL echo (`echo=True`) to see the query count. Then fix it using `joinedload()` and `selectinload()` and compare the query counts.

  4. Set up Alembic for your project. Create an initial migration from your models, apply it, then add a `bio` column to the `User` model, generate a new migration, and apply it. Practice rolling back and re-applying migrations.

  5. Write a function that performs a bulk insert of 10,000 records using `session.add_all()` and compare its performance to `session.execute(insert(Model).values(records))` using the core API. Time both approaches and explain the difference.

  6. Implement a soft-delete pattern: add an `is_deleted` column and a `deleted_at` timestamp. Override the default query to exclude soft-deleted records using a custom `Session` class or event hooks.

⚠️ Common Mistakes

  • Not closing or properly scoping database sessions. A session that is never closed leaks connections from the pool. Always use with SessionLocal() as session: or FastAPI's Depends(get_db) with yield to ensure cleanup.

  • Ignoring the N+1 query problem. Loading 100 users and then accessing user.posts for each triggers 101 queries. Use joinedload() (single JOIN query) or selectinload() (two queries with IN clause) to eagerly load relationships.

  • Confusing session.flush() with session.commit(). flush() sends SQL to the database but stays inside the current transaction — a subsequent rollback undoes it. commit() finalizes the transaction permanently. Use flush() when you need generated IDs before committing.

  • Using the old SQLAlchemy 1.x Column() style instead of 2.0's Mapped[] + mapped_column(). The 2.0 style provides type safety, better IDE support, and aligns with Python's type hint ecosystem. Interviewers will notice if you use the legacy API.

  • Hardcoding database URLs instead of reading from environment variables. Use os.getenv('DATABASE_URL') or a settings management library like pydantic-settings. This prevents credentials from leaking into version control and allows different configs per environment.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Databases & SQLAlchemy