NumPy: Numerical Computing

0/3 in this phase0/54 across the roadmap

📖 Concept

NumPy (Numerical Python) is the foundation of nearly every data science and scientific computing library in the Python ecosystem. It provides the ndarray — a powerful, memory-efficient, multi-dimensional array object that enables vectorized operations orders of magnitude faster than native Python lists.

Why NumPy over Python lists?

Feature	Python List	NumPy ndarray
Storage	Objects scattered in memory	Contiguous typed memory block
Speed	Slow (interpreted loops)	Fast (compiled C/Fortran kernels)
Operations	Element-by-element manually	Vectorized (whole-array ops)
Memory	~28 bytes per int object	~8 bytes per int64 element
Broadcasting	Not supported	Automatic shape alignment

Core concepts:

ndarray creation — np.array(), np.zeros(), np.ones(), np.arange(), np.linspace(), np.random module
Indexing & slicing — advanced indexing with boolean masks, fancy indexing with integer arrays, and slice objects that return views (not copies)
Broadcasting — NumPy's mechanism for performing arithmetic on arrays of different shapes. A (3, 1) array can be added to a (1, 4) array, producing a (3, 4) result. Rules: dimensions are compared right-to-left; they must either match or one must be 1
Vectorization — replacing explicit Python loops with array-level operations. np.sum(arr) is 50-100x faster than sum(list) because the loop runs in compiled C code
Linear algebra — np.dot(), np.linalg.inv(), np.linalg.eig(), @ operator for matrix multiplication
Random number generation — the modern np.random.default_rng() API provides reproducible, thread-safe random streams

Performance tip: Always prefer vectorized operations over Python loops. When you find yourself writing for i in range(len(arr)), there is almost certainly a NumPy function that does it faster. Use np.where() instead of if/else loops, np.vectorize() as a last resort (it is syntactic sugar, not a true vectorizer).

💻 Code Example

codeTap to expand ⛶

1# ============================================================
2# NumPy: Numerical Computing Fundamentals
3# ============================================================
4import numpy as np
5
6# --- Array Creation ---
7# From Python list
8arr_1d = np.array([1, 2, 3, 4, 5])
9arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
10
11# Built-in constructors
12zeros = np.zeros((3, 4))                  # 3x4 matrix of zeros
13ones = np.ones((2, 3), dtype=np.float32)  # specify data type
14identity = np.eye(4)                       # 4x4 identity matrix
15seq = np.arange(0, 10, 0.5)               # like range(), but for floats
16linspace = np.linspace(0, 1, 50)           # 50 evenly spaced points
17
18# Array properties
19print(f"Shape: {arr_2d.shape}")       # (3, 3)
20print(f"Dtype: {arr_2d.dtype}")       # int64
21print(f"Dimensions: {arr_2d.ndim}")   # 2
22print(f"Size: {arr_2d.size}")         # 9
23print(f"Bytes: {arr_2d.nbytes}")      # 72
24
25# --- Indexing & Slicing ---
26matrix = np.arange(20).reshape(4, 5)
27
28# Basic slicing (returns VIEWS, not copies)
29row_slice = matrix[1:3, :]          # rows 1–2, all columns
30col_slice = matrix[:, 2]            # all rows, column 2
31submatrix = matrix[1:3, 2:4]       # rows 1–2, columns 2–3
32
33# Boolean masking — extremely powerful for filtering
34data = np.array([15, 22, 8, 31, 45, 12, 27])
35mask = data > 20
36filtered = data[mask]               # array([22, 31, 45, 27])
37data[mask] = 0                      # set values > 20 to zero in-place
38
39# Fancy indexing (returns COPIES, not views)
40indices = np.array([0, 3, 4])
41selected = data[indices]            # pick specific elements
42
43# --- Broadcasting ---
44# Rule: dimensions are compared right-to-left;
45#       they must match or one must be 1
46a = np.array([[1], [2], [3]])       # shape (3, 1)
47b = np.array([10, 20, 30, 40])     # shape (4,) -> broadcast to (1, 4)
48result = a + b                      # shape (3, 4) — automatic expansion
49# [[11, 21, 31, 41],
50#  [12, 22, 32, 42],
51#  [13, 23, 33, 43]]
52
53# Practical: normalize columns to zero mean
54data_matrix = np.random.default_rng(42).standard_normal((100, 5))
55col_means = data_matrix.mean(axis=0)       # shape (5,)
56col_stds = data_matrix.std(axis=0)         # shape (5,)
57normalized = (data_matrix - col_means) / col_stds  # broadcasting!
58
59# --- Vectorized Operations ---
60x = np.linspace(0, 2 * np.pi, 1000)
61y = np.sin(x) * np.exp(-x / 5)     # element-wise, no loops
62
63# Conditional logic without loops
64scores = np.array([85, 42, 91, 67, 73, 55, 88])
65grades = np.where(scores >= 70, "Pass", "Fail")
66
67# Aggregations along axes
68matrix = np.random.default_rng(0).integers(1, 100, size=(4, 5))
69print(f"Column sums:  {matrix.sum(axis=0)}")    # sum down rows
70print(f"Row means:    {matrix.mean(axis=1)}")    # mean across columns
71print(f"Global max:   {matrix.max()}")
72print(f"Argmax col 0: {matrix[:, 0].argmax()}")  # index of max
73
74# --- Linear Algebra ---
75A = np.array([[2, 1], [5, 3]])
76B = np.array([[4, 2], [1, 6]])
77
78product = A @ B                     # matrix multiplication
79determinant = np.linalg.det(A)      # determinant
80inverse = np.linalg.inv(A)          # inverse
81eigenvalues, eigenvectors = np.linalg.eig(A)
82
83# Solve linear system: Ax = b
84b_vec = np.array([8, 13])
85x_solution = np.linalg.solve(A, b_vec)
86print(f"Solution: {x_solution}")     # [1. 6.] meaning 2*1+1*6=8, 5*1+3*6=23... verify!
87
88# --- Random Number Generation (modern API) ---
89rng = np.random.default_rng(seed=42)
90uniform_samples = rng.uniform(0, 1, size=1000)
91normal_samples = rng.standard_normal(size=(100, 3))
92integers = rng.integers(1, 7, size=20)     # dice rolls
93choice = rng.choice(["red", "green", "blue"], size=10, p=[0.5, 0.3, 0.2])
94
95# Shuffle and permutation
96arr = np.arange(10)
97rng.shuffle(arr)                    # in-place shuffle
98permuted = rng.permutation(10)      # returns new shuffled array
99
100# --- Performance Comparison ---
101import time
102
103size = 1_000_000
104py_list = list(range(size))
105np_arr = np.arange(size)
106
107start = time.perf_counter()
108py_result = [x ** 2 for x in py_list]
109py_time = time.perf_counter() - start
110
111start = time.perf_counter()
112np_result = np_arr ** 2
113np_time = time.perf_counter() - start
114
115print(f"Python list: {py_time:.4f}s")
116print(f"NumPy array: {np_time:.4f}s")
117print(f"Speedup: {py_time / np_time:.0f}x")

🏋️ Practice Exercise

Create a 10x10 matrix of random integers between 1 and 100. Compute the mean, median, and standard deviation of each row and each column. Find the row with the highest mean and the column with the lowest variance.
Implement a function that normalizes a 2D array using min-max scaling (scale each column to [0, 1]) using only NumPy operations — no loops allowed. Verify that each column's min is 0 and max is 1.
Use boolean masking and fancy indexing to extract all elements from a 5x5 random matrix that are greater than the matrix's overall mean. Replace those values with the column mean of their respective column.
Implement matrix operations from scratch: write functions for matrix transpose, matrix multiplication, and computing the trace — then verify your results against NumPy's built-in np.transpose(), @, and np.trace().
Generate 10,000 samples from a normal distribution with mean=50 and std=15. Use NumPy to compute the 25th, 50th, and 75th percentiles. Count how many values fall within 1, 2, and 3 standard deviations of the mean and compare to the empirical rule (68-95-99.7).

⚠️ Common Mistakes

Using Python loops instead of vectorized NumPy operations — this can be 50-100x slower and defeats the purpose of NumPy.
Confusing views and copies: slicing returns a view (modifications affect the original), while fancy indexing returns a copy. Use .copy() explicitly when you need independence.
Ignoring broadcasting rules and getting unexpected shapes — always check .shape after operations and understand that broadcasting aligns dimensions right-to-left.
Using the legacy np.random.seed() API instead of the modern np.random.default_rng() generator, which is thread-safe and provides better statistical properties.
Forgetting that NumPy integer overflow wraps silently (e.g., np.int8(127) + np.int8(1) gives -128). Always choose appropriate dtypes for your data range.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for NumPy: Numerical Computing

Was this topic helpful?

← PreviousDatabases & SQLAlchemy Next →pandas: Data Analysis