Data Visualization

0/3 in this phase0/54 across the roadmap

📖 Concept

Data visualization transforms raw numbers into insight. Python's visualization ecosystem is built on three major libraries: Matplotlib (the foundational layer), Seaborn (statistical visualization built on Matplotlib), and Plotly (interactive, web-based charts). Understanding when and how to use each is essential for effective data communication.

Matplotlib is Python's original plotting library and remains the most flexible. It uses a hierarchical object model: Figure (the canvas) contains one or more Axes (individual plots), which contain plot elements (lines, bars, text). There are two APIs:

pyplot API (plt.plot(), plt.bar()) — stateful, MATLAB-like, convenient for quick plots
Object-oriented API (fig, ax = plt.subplots()) — explicit, preferred for production code and multi-panel figures

Seaborn provides high-level functions for statistical visualization. Built on Matplotlib, it offers beautiful defaults, automatic statistical aggregation, and tight integration with pandas DataFrames. Key function categories:

Relational — scatterplot(), lineplot() for continuous relationships
Categorical — boxplot(), violinplot(), barplot(), stripplot() for category comparisons
Distribution — histplot(), kdeplot(), ecdfplot() for understanding data spread
Matrix — heatmap(), clustermap() for correlation and similarity matrices

Plotly enables interactive charts that users can hover, zoom, and filter. Plotly Express provides a concise API similar to Seaborn, while the graph_objects module offers fine-grained control. Ideal for dashboards, web applications, and exploratory analysis in Jupyter notebooks.

Choosing the right chart type:

Goal	Chart Types
Distribution	Histogram, KDE, Box plot, Violin
Comparison	Bar chart, Grouped bar, Dot plot
Relationship	Scatter plot, Bubble chart, Heatmap
Composition	Pie chart (sparingly), Stacked bar, Treemap
Trend over time	Line chart, Area chart

Best practices: Always label axes and provide a title. Use colorblind-friendly palettes. Keep chart-to-data ratio high (minimize chartjunk). Choose chart types based on the story you want to tell, not visual appeal.

💻 Code Example

codeTap to expand ⛶

1# ============================================================
2# Data Visualization: Matplotlib, Seaborn, and Plotly
3# ============================================================
4import matplotlib.pyplot as plt
5import matplotlib.ticker as ticker
6import seaborn as sns
7import numpy as np
8import pandas as pd
9
10# --- Sample Data ---
11np.random.seed(42)
12n = 200
13df = pd.DataFrame({
14    "age": np.random.normal(35, 10, n).astype(int).clip(18, 65),
15    "salary": np.random.normal(75000, 20000, n).clip(30000, 150000),
16    "experience": np.random.normal(8, 4, n).clip(0, 30),
17    "department": np.random.choice(
18        ["Engineering", "Marketing", "Sales", "HR"], n,
19        p=[0.4, 0.25, 0.2, 0.15]
20    ),
21    "satisfaction": np.random.uniform(1, 10, n).round(1),
22})
23df["salary"] = df["salary"] + df["experience"] * 2000  # add correlation
24
25
26# ============================================================
27# 1. Matplotlib — Object-Oriented API (production style)
28# ============================================================
29fig, axes = plt.subplots(2, 2, figsize=(12, 10))
30fig.suptitle("Employee Dashboard", fontsize=16, fontweight="bold")
31
32# Panel 1: Histogram with KDE overlay
33ax = axes[0, 0]
34ax.hist(df["salary"], bins=30, edgecolor="white", alpha=0.7,
35        color="#2196F3", density=True, label="Histogram")
36# Overlay KDE using numpy
37from scipy.stats import gaussian_kde
38kde = gaussian_kde(df["salary"])
39x_range = np.linspace(df["salary"].min(), df["salary"].max(), 200)
40ax.plot(x_range, kde(x_range), color="#FF5722", linewidth=2, label="KDE")
41ax.set_xlabel("Salary ($)")
42ax.set_ylabel("Density")
43ax.set_title("Salary Distribution")
44ax.xaxis.set_major_formatter(ticker.FuncFormatter(
45    lambda x, _: f"${x/1000:.0f}K"
46))
47ax.legend()
48
49# Panel 2: Scatter plot with regression line
50ax = axes[0, 1]
51colors = {"Engineering": "#2196F3", "Marketing": "#4CAF50",
52          "Sales": "#FF9800", "HR": "#9C27B0"}
53for dept, group in df.groupby("department"):
54    ax.scatter(group["experience"], group["salary"],
55               alpha=0.6, label=dept, color=colors[dept], s=30)
56# Add trendline
57z = np.polyfit(df["experience"], df["salary"], 1)
58p = np.poly1d(z)
59ax.plot(sorted(df["experience"]), p(sorted(df["experience"])),
60        "r--", linewidth=2, label=f"Trend (slope={z[0]:,.0f})")
61ax.set_xlabel("Years of Experience")
62ax.set_ylabel("Salary ($)")
63ax.set_title("Experience vs Salary")
64ax.legend(fontsize=8)
65
66# Panel 3: Box plot by department
67ax = axes[1, 0]
68dept_order = df.groupby("department")["salary"].median().sort_values().index
69bp = ax.boxplot(
70    [df[df["department"] == d]["salary"] for d in dept_order],
71    labels=dept_order, patch_artist=True, notch=True,
72)
73for patch, dept in zip(bp["boxes"], dept_order):
74    patch.set_facecolor(colors[dept])
75    patch.set_alpha(0.7)
76ax.set_ylabel("Salary ($)")
77ax.set_title("Salary by Department")
78ax.tick_params(axis="x", rotation=15)
79
80# Panel 4: Bar chart of average satisfaction
81ax = axes[1, 1]
82dept_satisfaction = (
83    df.groupby("department")["satisfaction"]
84    .agg(["mean", "std"])
85    .sort_values("mean", ascending=True)
86)
87bars = ax.barh(dept_satisfaction.index, dept_satisfaction["mean"],
88               xerr=dept_satisfaction["std"], capsize=5,
89               color=[colors[d] for d in dept_satisfaction.index],
90               alpha=0.8, edgecolor="white")
91ax.set_xlabel("Satisfaction Score (1-10)")
92ax.set_title("Average Satisfaction by Department")
93ax.set_xlim(0, 10)
94for bar, val in zip(bars, dept_satisfaction["mean"]):
95    ax.text(val + 0.3, bar.get_y() + bar.get_height() / 2,
96            f"{val:.1f}", va="center", fontweight="bold")
97
98plt.tight_layout()
99plt.savefig("employee_dashboard.png", dpi=150, bbox_inches="tight")
100plt.show()
101
102
103# ============================================================
104# 2. Seaborn — Statistical Visualization
105# ============================================================
106sns.set_theme(style="whitegrid", palette="husl", font_scale=1.1)
107
108# FacetGrid: distribution per department
109g = sns.FacetGrid(df, col="department", col_wrap=2,
110                  height=4, aspect=1.2)
111g.map_dataframe(sns.histplot, x="salary", kde=True, bins=20)
112g.set_titles("{col_name}")
113g.set_axis_labels("Salary ($)", "Count")
114plt.tight_layout()
115plt.show()
116
117# Pair plot: multi-variable relationships
118sns.pairplot(df[["age", "salary", "experience", "satisfaction",
119                 "department"]],
120             hue="department", diag_kind="kde",
121             plot_kws={"alpha": 0.5, "s": 20})
122plt.suptitle("Pairwise Relationships", y=1.02)
123plt.show()
124
125# Heatmap: correlation matrix
126fig, ax = plt.subplots(figsize=(8, 6))
127numeric_cols = df.select_dtypes(include=[np.number])
128corr = numeric_cols.corr()
129mask = np.triu(np.ones_like(corr, dtype=bool))  # upper triangle mask
130sns.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap="coolwarm",
131            center=0, square=True, linewidths=0.5, ax=ax,
132            cbar_kws={"shrink": 0.8})
133ax.set_title("Correlation Matrix")
134plt.tight_layout()
135plt.show()
136
137# Violin + strip plot combo
138fig, ax = plt.subplots(figsize=(10, 6))
139sns.violinplot(data=df, x="department", y="salary", inner=None,
140               alpha=0.3, ax=ax)
141sns.stripplot(data=df, x="department", y="salary", size=3,
142              alpha=0.6, jitter=True, ax=ax)
143ax.set_title("Salary Distribution by Department (Violin + Strip)")
144plt.tight_layout()
145plt.show()
146
147
148# ============================================================
149# 3. Plotly Express — Interactive Charts
150# ============================================================
151# NOTE: Plotly outputs render in Jupyter notebooks or as HTML files.
152# Uncomment the lines below to generate interactive charts.
153
154# import plotly.express as px
155#
156# # Interactive scatter with hover data
157# fig = px.scatter(
158#     df, x="experience", y="salary", color="department",
159#     size="satisfaction", hover_data=["age"],
160#     title="Interactive: Experience vs Salary",
161#     labels={"experience": "Years of Experience",
162#             "salary": "Annual Salary ($)"},
163#     template="plotly_white",
164# )
165# fig.update_traces(marker=dict(opacity=0.7, line=dict(width=0.5)))
166# fig.show()   # opens in browser or renders in notebook
167#
168# # Animated scatter over age groups
169# df["age_bin"] = pd.cut(df["age"], bins=5).astype(str)
170# fig = px.scatter(
171#     df, x="experience", y="salary", color="department",
172#     animation_frame="age_bin", size="satisfaction",
173#     range_y=[20000, 180000], range_x=[0, 35],
174#     title="Salary by Experience (Animated by Age Group)",
175# )
176# fig.show()
177
178
179# ============================================================
180# 4. Styling & Best Practices
181# ============================================================
182# Custom style context manager
183with plt.style.context("seaborn-v0_8-paper"):
184    fig, ax = plt.subplots(figsize=(8, 5))
185    dept_counts = df["department"].value_counts()
186    colors_list = [colors.get(d, "#999") for d in dept_counts.index]
187    bars = ax.bar(dept_counts.index, dept_counts.values,
188                  color=colors_list, edgecolor="white", linewidth=1.5)
189
190    # Annotate bars with values
191    for bar in bars:
192        height = bar.get_height()
193        ax.text(bar.get_x() + bar.get_width() / 2., height + 1,
194                f"{int(height)}", ha="center", va="bottom",
195                fontweight="bold", fontsize=12)
196
197    ax.set_ylabel("Number of Employees")
198    ax.set_title("Headcount by Department",
199                 fontsize=14, fontweight="bold")
200    ax.spines[["top", "right"]].set_visible(False)  # remove chartjunk
201    plt.tight_layout()
202    plt.savefig("headcount.png", dpi=150, bbox_inches="tight",
203                facecolor="white")
204    plt.show()

🏋️ Practice Exercise

Create a 2x2 subplot dashboard using Matplotlib's object-oriented API showing: (a) a histogram of a numeric column, (b) a scatter plot with a trendline, (c) a horizontal bar chart with error bars, and (d) a pie chart with percentage labels. Apply consistent styling across all panels.
Use Seaborn to create a pair plot of at least 4 numeric variables colored by a categorical variable. Then create a heatmap of the correlation matrix with annotations. Interpret which variables are most strongly correlated and why.
Build a Seaborn FacetGrid that shows the distribution of salaries across departments, with each panel representing a different experience level bin (0-5, 5-10, 10-15, 15+ years). Add KDE overlays and consistent axis limits.
Create a publication-quality figure with Matplotlib that includes: a custom color palette, removed top/right spines, formatted tick labels (e.g., "$50K" instead of 50000), a legend outside the plot area, and export it as both PNG (150 dpi) and SVG.
(Bonus) Use Plotly Express to create an interactive scatter plot with hover tooltips showing all data fields, color by category, size by a numeric variable, and add dropdown filters. Export it as a self-contained HTML file.

⚠️ Common Mistakes

Using the pyplot stateful API (plt.plot()) for complex multi-panel figures — always use the object-oriented API (fig, ax = plt.subplots()) for anything beyond quick exploratory plots.
Forgetting plt.tight_layout() or bbox_inches='tight' when saving, resulting in cut-off labels and overlapping titles.
Choosing chart types for visual appeal rather than data appropriateness — pie charts for more than 5 categories, 3D bar charts when 2D suffices, or line charts for non-sequential categorical data.
Not considering colorblind accessibility — avoid red/green-only palettes. Use Seaborn's colorblind-friendly palettes like 'colorblind', 'deep', or viridis/plasma from Matplotlib.
Creating overly complex visualizations that obscure the message. Effective charts have a high data-to-ink ratio — remove gridlines, borders, and decorations that do not aid interpretation.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Data Visualization

Was this topic helpful?

← Previouspandas: Data Analysis Next →Design Patterns in Python