NumPy: Numerical Computing
📖 Concept
NumPy (Numerical Python) is the foundation of nearly every data science and scientific computing library in the Python ecosystem. It provides the ndarray — a powerful, memory-efficient, multi-dimensional array object that enables vectorized operations orders of magnitude faster than native Python lists.
Why NumPy over Python lists?
| Feature | Python List | NumPy ndarray |
|---|---|---|
| Storage | Objects scattered in memory | Contiguous typed memory block |
| Speed | Slow (interpreted loops) | Fast (compiled C/Fortran kernels) |
| Operations | Element-by-element manually | Vectorized (whole-array ops) |
| Memory | ~28 bytes per int object | ~8 bytes per int64 element |
| Broadcasting | Not supported | Automatic shape alignment |
Core concepts:
- ndarray creation —
np.array(),np.zeros(),np.ones(),np.arange(),np.linspace(),np.randommodule - Indexing & slicing — advanced indexing with boolean masks, fancy indexing with integer arrays, and slice objects that return views (not copies)
- Broadcasting — NumPy's mechanism for performing arithmetic on arrays of different shapes. A
(3, 1)array can be added to a(1, 4)array, producing a(3, 4)result. Rules: dimensions are compared right-to-left; they must either match or one must be 1 - Vectorization — replacing explicit Python loops with array-level operations.
np.sum(arr)is 50-100x faster thansum(list)because the loop runs in compiled C code - Linear algebra —
np.dot(),np.linalg.inv(),np.linalg.eig(),@operator for matrix multiplication - Random number generation — the modern
np.random.default_rng()API provides reproducible, thread-safe random streams
Performance tip: Always prefer vectorized operations over Python loops. When you find yourself writing for i in range(len(arr)), there is almost certainly a NumPy function that does it faster. Use np.where() instead of if/else loops, np.vectorize() as a last resort (it is syntactic sugar, not a true vectorizer).
💻 Code Example
1# ============================================================2# NumPy: Numerical Computing Fundamentals3# ============================================================4import numpy as np56# --- Array Creation ---7# From Python list8arr_1d = np.array([1, 2, 3, 4, 5])9arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])1011# Built-in constructors12zeros = np.zeros((3, 4)) # 3x4 matrix of zeros13ones = np.ones((2, 3), dtype=np.float32) # specify data type14identity = np.eye(4) # 4x4 identity matrix15seq = np.arange(0, 10, 0.5) # like range(), but for floats16linspace = np.linspace(0, 1, 50) # 50 evenly spaced points1718# Array properties19print(f"Shape: {arr_2d.shape}") # (3, 3)20print(f"Dtype: {arr_2d.dtype}") # int6421print(f"Dimensions: {arr_2d.ndim}") # 222print(f"Size: {arr_2d.size}") # 923print(f"Bytes: {arr_2d.nbytes}") # 722425# --- Indexing & Slicing ---26matrix = np.arange(20).reshape(4, 5)2728# Basic slicing (returns VIEWS, not copies)29row_slice = matrix[1:3, :] # rows 1–2, all columns30col_slice = matrix[:, 2] # all rows, column 231submatrix = matrix[1:3, 2:4] # rows 1–2, columns 2–33233# Boolean masking — extremely powerful for filtering34data = np.array([15, 22, 8, 31, 45, 12, 27])35mask = data > 2036filtered = data[mask] # array([22, 31, 45, 27])37data[mask] = 0 # set values > 20 to zero in-place3839# Fancy indexing (returns COPIES, not views)40indices = np.array([0, 3, 4])41selected = data[indices] # pick specific elements4243# --- Broadcasting ---44# Rule: dimensions are compared right-to-left;45# they must match or one must be 146a = np.array([[1], [2], [3]]) # shape (3, 1)47b = np.array([10, 20, 30, 40]) # shape (4,) -> broadcast to (1, 4)48result = a + b # shape (3, 4) — automatic expansion49# [[11, 21, 31, 41],50# [12, 22, 32, 42],51# [13, 23, 33, 43]]5253# Practical: normalize columns to zero mean54data_matrix = np.random.default_rng(42).standard_normal((100, 5))55col_means = data_matrix.mean(axis=0) # shape (5,)56col_stds = data_matrix.std(axis=0) # shape (5,)57normalized = (data_matrix - col_means) / col_stds # broadcasting!5859# --- Vectorized Operations ---60x = np.linspace(0, 2 * np.pi, 1000)61y = np.sin(x) * np.exp(-x / 5) # element-wise, no loops6263# Conditional logic without loops64scores = np.array([85, 42, 91, 67, 73, 55, 88])65grades = np.where(scores >= 70, "Pass", "Fail")6667# Aggregations along axes68matrix = np.random.default_rng(0).integers(1, 100, size=(4, 5))69print(f"Column sums: {matrix.sum(axis=0)}") # sum down rows70print(f"Row means: {matrix.mean(axis=1)}") # mean across columns71print(f"Global max: {matrix.max()}")72print(f"Argmax col 0: {matrix[:, 0].argmax()}") # index of max7374# --- Linear Algebra ---75A = np.array([[2, 1], [5, 3]])76B = np.array([[4, 2], [1, 6]])7778product = A @ B # matrix multiplication79determinant = np.linalg.det(A) # determinant80inverse = np.linalg.inv(A) # inverse81eigenvalues, eigenvectors = np.linalg.eig(A)8283# Solve linear system: Ax = b84b_vec = np.array([8, 13])85x_solution = np.linalg.solve(A, b_vec)86print(f"Solution: {x_solution}") # [1. 6.] meaning 2*1+1*6=8, 5*1+3*6=23... verify!8788# --- Random Number Generation (modern API) ---89rng = np.random.default_rng(seed=42)90uniform_samples = rng.uniform(0, 1, size=1000)91normal_samples = rng.standard_normal(size=(100, 3))92integers = rng.integers(1, 7, size=20) # dice rolls93choice = rng.choice(["red", "green", "blue"], size=10, p=[0.5, 0.3, 0.2])9495# Shuffle and permutation96arr = np.arange(10)97rng.shuffle(arr) # in-place shuffle98permuted = rng.permutation(10) # returns new shuffled array99100# --- Performance Comparison ---101import time102103size = 1_000_000104py_list = list(range(size))105np_arr = np.arange(size)106107start = time.perf_counter()108py_result = [x ** 2 for x in py_list]109py_time = time.perf_counter() - start110111start = time.perf_counter()112np_result = np_arr ** 2113np_time = time.perf_counter() - start114115print(f"Python list: {py_time:.4f}s")116print(f"NumPy array: {np_time:.4f}s")117print(f"Speedup: {py_time / np_time:.0f}x")
🏋️ Practice Exercise
Create a 10x10 matrix of random integers between 1 and 100. Compute the mean, median, and standard deviation of each row and each column. Find the row with the highest mean and the column with the lowest variance.
Implement a function that normalizes a 2D array using min-max scaling (scale each column to [0, 1]) using only NumPy operations — no loops allowed. Verify that each column's min is 0 and max is 1.
Use boolean masking and fancy indexing to extract all elements from a 5x5 random matrix that are greater than the matrix's overall mean. Replace those values with the column mean of their respective column.
Implement matrix operations from scratch: write functions for matrix transpose, matrix multiplication, and computing the trace — then verify your results against NumPy's built-in
np.transpose(),@, andnp.trace().Generate 10,000 samples from a normal distribution with mean=50 and std=15. Use NumPy to compute the 25th, 50th, and 75th percentiles. Count how many values fall within 1, 2, and 3 standard deviations of the mean and compare to the empirical rule (68-95-99.7).
⚠️ Common Mistakes
Using Python loops instead of vectorized NumPy operations — this can be 50-100x slower and defeats the purpose of NumPy.
Confusing views and copies: slicing returns a view (modifications affect the original), while fancy indexing returns a copy. Use
.copy()explicitly when you need independence.Ignoring broadcasting rules and getting unexpected shapes — always check
.shapeafter operations and understand that broadcasting aligns dimensions right-to-left.Using the legacy
np.random.seed()API instead of the modernnp.random.default_rng()generator, which is thread-safe and provides better statistical properties.Forgetting that NumPy integer overflow wraps silently (e.g.,
np.int8(127) + np.int8(1)gives-128). Always choose appropriate dtypes for your data range.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for NumPy: Numerical Computing