Beginner Enthusiast 90 min read

Chapter 1: Linear Algebra for ML

Vectors, matrices, eigenvalues, and SVD - the mathematical language of machine learning.

Libraries covered: NumPy

Learning Objectives

["Understand vectors and vector operations", "Master matrix operations and decompositions", "Apply linear algebra concepts to ML problems"]

1.1 Vectors and Vector Spaces Beginner

Vectors and Vector Spaces

Vectors are the fundamental building blocks of machine learning. From representing data points to encoding neural network weights, understanding vectors is essential for anyone working in AI and ML.

What is a Vector?

A vector is an ordered collection of numbers. In machine learning, we typically work with column vectors:

\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}

Each element $$v_i$$ represents a feature or dimension. For example, a house might be represented as:

\mathbf{house} = \begin{bmatrix} 1500 \\ 3 \\ 2 \\ 0.25 \end{bmatrix}

where the dimensions represent square footage, bedrooms, bathrooms, and lot size in acres.

Vectors in Python with NumPy

NumPy is the standard library for vector operations in Python:

PYTHON

import numpy as np

# Creating vectors
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])

# Vector from a list of features
house_features = np.array([1500, 3, 2, 0.25])
print(f"House vector: {house_features}")
print(f"Number of features (dimensions): {len(house_features)}")

# Creating special vectors
zeros = np.zeros(5)      # [0, 0, 0, 0, 0]
ones = np.ones(5)        # [1, 1, 1, 1, 1]
range_vec = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5)  # [0, 0.25, 0.5, 0.75, 1]

print(f"Zeros: {zeros}")
print(f"Ones: {ones}")
print(f"Range: {range_vec}")
print(f"Linspace: {linspace}")

Vector Addition and Scalar Multiplication

Two fundamental operations define vector spaces:

Vector Addition: Add corresponding elements

\mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{bmatrix}

Scalar Multiplication: Multiply each element by a scalar

c \cdot \mathbf{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}

PYTHON

import numpy as np

u = np.array([1, 2, 3])
v = np.array([4, 5, 6])

# Vector addition
addition = u + v
print(f"u + v = {addition}")  # [5, 7, 9]

# Vector subtraction
subtraction = u - v
print(f"u - v = {subtraction}")  # [-3, -3, -3]

# Scalar multiplication
scaled = 3 * u
print(f"3 * u = {scaled}")  # [3, 6, 9]

# Linear combination: 2u + 3v
linear_combo = 2*u + 3*v
print(f"2u + 3v = {linear_combo}")  # [14, 19, 24]

The Dot Product

The dot product is one of the most important operations in ML:

\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n

The dot product has a geometric interpretation:

\mathbf{u} \cdot \mathbf{v} = \|\mathbf{u}\| \|\mathbf{v}\| \cos(\theta)

where $\theta$ is the angle between the vectors.

PYTHON

import numpy as np

u = np.array([1, 2, 3])
v = np.array([4, 5, 6])

# Dot product - three equivalent ways
dot1 = np.dot(u, v)
dot2 = u @ v
dot3 = np.sum(u * v)

print(f"np.dot(u, v) = {dot1}")  # 32
print(f"u @ v = {dot2}")          # 32
print(f"sum(u * v) = {dot3}")     # 32

# Dot product in ML: weighted sum
weights = np.array([0.5, 0.3, 0.2])
features = np.array([100, 50, 25])
prediction = weights @ features
print(f"Weighted prediction: {prediction}")  # 65.0

ML Application: Linear Models

In linear regression and neural networks, predictions are computed as dot products:

PYTHON

import numpy as np

# Simple linear model: y = w·x + b
def predict(X, weights, bias):
    return X @ weights + bias

# Training data: 3 samples, 4 features each
X = np.array([
    [1500, 3, 2, 0.25],  # House 1
    [2000, 4, 3, 0.5],   # House 2
    [1200, 2, 1, 0.15]   # House 3
])

# Learned weights (one per feature)
weights = np.array([100, 10000, 5000, 50000])
bias = 50000

# Predictions for all houses
predictions = predict(X, weights, bias)
print(f"Predicted prices: {predictions}")
# [237500, 332500, 182500]

Vector Norms

Norms measure the "size" or "length" of a vector. Different norms are used in different ML contexts:

L2 Norm (Euclidean):

\|\mathbf{v}\|_2 = \sqrt{\sum_{i=1}^{n} v_i^2}

L1 Norm (Manhattan):

\|\mathbf{v}\|_1 = \sum_{i=1}^{n} |v_i|

L-infinity Norm (Max):

\|\mathbf{v}\|_\infty = \max_i |v_i|

PYTHON

import numpy as np

v = np.array([3, -4, 0])

# L2 norm (Euclidean distance from origin)
l2_norm = np.linalg.norm(v)  # or np.sqrt(np.sum(v**2))
print(f"L2 norm: {l2_norm}")  # 5.0

# L1 norm (sum of absolute values)
l1_norm = np.linalg.norm(v, ord=1)  # or np.sum(np.abs(v))
print(f"L1 norm: {l1_norm}")  # 7.0

# L-infinity norm (maximum absolute value)
linf_norm = np.linalg.norm(v, ord=np.inf)  # or np.max(np.abs(v))
print(f"L-inf norm: {linf_norm}")  # 4.0

# Visualizing different norm "balls"
import matplotlib.pyplot as plt

theta = np.linspace(0, 2*np.pi, 100)

# L2 ball (circle)
x_l2 = np.cos(theta)
y_l2 = np.sin(theta)

# L1 ball (diamond)
t = np.linspace(0, 2*np.pi, 100)
x_l1 = np.sign(np.cos(t)) * np.abs(np.cos(t))
y_l1 = np.sign(np.sin(t)) * np.abs(np.sin(t))
scale = 1 / (np.abs(x_l1) + np.abs(y_l1) + 1e-10)
x_l1, y_l1 = x_l1 * scale, y_l1 * scale

plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.plot(x_l2, y_l2, 'b-', label='L2 ball')
plt.title('L2 Norm Ball (Circle)')
plt.axis('equal')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot([-1, 0, 1, 0, -1], [0, 1, 0, -1, 0], 'r-', label='L1 ball')
plt.title('L1 Norm Ball (Diamond)')
plt.axis('equal')
plt.grid(True)
plt.tight_layout()
plt.savefig('norm_balls.png', dpi=100)
plt.show()

Unit Vectors and Normalization

A unit vector has norm 1. Normalizing a vector means scaling it to have unit length:

\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

Normalization is crucial in ML for:

Preventing features with large magnitudes from dominating
Computing cosine similarity
Ensuring stable gradient descent

PYTHON

import numpy as np

v = np.array([3, 4])

# Normalize to unit vector
norm = np.linalg.norm(v)
unit_v = v / norm

print(f"Original vector: {v}")
print(f"Norm: {norm}")  # 5.0
print(f"Unit vector: {unit_v}")  # [0.6, 0.8]
print(f"Unit vector norm: {np.linalg.norm(unit_v)}")  # 1.0

# Normalize a dataset (feature scaling)
data = np.array([
    [100, 0.5],
    [200, 0.8],
    [150, 0.3]
])

# Normalize each row to unit length
row_norms = np.linalg.norm(data, axis=1, keepdims=True)
normalized_data = data / row_norms

print("Original data:")
print(data)
print("\nNormalized data (each row has L2 norm = 1):")
print(normalized_data)

Cosine Similarity

Cosine similarity measures the angle between vectors, ignoring magnitude:

\text{similarity}(\mathbf{u}, \mathbf{v}) = \cos(\theta) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \|\mathbf{v}\|}

This ranges from -1 (opposite directions) to 1 (same direction).

PYTHON

import numpy as np

def cosine_similarity(u, v):
    return np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))

# Document vectors (word counts)
doc1 = np.array([3, 2, 0, 5, 0, 0, 0, 2, 0, 0])  # "machine learning"
doc2 = np.array([3, 0, 0, 4, 0, 0, 0, 1, 0, 0])  # "deep learning"
doc3 = np.array([0, 0, 7, 0, 0, 6, 0, 0, 0, 0])  # "cooking recipe"

sim_12 = cosine_similarity(doc1, doc2)
sim_13 = cosine_similarity(doc1, doc3)
sim_23 = cosine_similarity(doc2, doc3)

print(f"Similarity (doc1, doc2): {sim_12:.4f}")  # High - both about ML
print(f"Similarity (doc1, doc3): {sim_13:.4f}")  # Low - different topics
print(f"Similarity (doc2, doc3): {sim_23:.4f}")  # Low - different topics

# Using scikit-learn
from sklearn.metrics.pairwise import cosine_similarity as sklearn_cosine

# sklearn expects 2D arrays
docs = np.array([doc1, doc2, doc3])
similarity_matrix = sklearn_cosine(docs)
print("\nSimilarity matrix:")
print(similarity_matrix)

Vector Spaces

A vector space is a set of vectors closed under addition and scalar multiplication. Key concepts:

Linear Combination: A sum of scaled vectors

\mathbf{w} = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k

Span: All possible linear combinations of a set of vectors

Linear Independence: Vectors where no vector can be written as a linear combination of others

Basis: A linearly independent set that spans the space

PYTHON

import numpy as np

# Standard basis vectors in R^3
e1 = np.array([1, 0, 0])
e2 = np.array([0, 1, 0])
e3 = np.array([0, 0, 1])

# Any vector can be written as a linear combination of basis vectors
v = np.array([3, -2, 5])
# v = 3*e1 + (-2)*e2 + 5*e3

reconstruction = 3*e1 + (-2)*e2 + 5*e3
print(f"Original: {v}")
print(f"Reconstructed: {reconstruction}")
print(f"Equal: {np.allclose(v, reconstruction)}")

# Checking linear independence using matrix rank
vectors = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
])
rank = np.linalg.matrix_rank(vectors)
print(f"\nRank of standard basis: {rank}")  # 3 (full rank = linearly independent)

# Linearly dependent vectors
dependent = np.array([
    [1, 2, 3],
    [2, 4, 6],  # 2 * first vector
    [0, 1, 0]
])
rank_dep = np.linalg.matrix_rank(dependent)
print(f"Rank of dependent set: {rank_dep}")  # 2 (not full rank)

ML Application: Feature Vectors and Similarity Search

A complete example using vectors for ML:

PYTHON

import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances

# Load the Iris dataset
iris = load_iris()
X = iris.data  # 150 samples, 4 features each
y = iris.target
feature_names = iris.feature_names

print(f"Dataset shape: {X.shape}")
print(f"Features: {feature_names}")
print(f"\nFirst 3 samples (as vectors):")
for i in range(3):
    print(f"  Sample {i}: {X[i]} -> Class {iris.target_names[y[i]]}")

# Normalize features
scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

# Find most similar flowers to the first sample
query = X_normalized[0].reshape(1, -1)

# Compute cosine similarities
similarities = cosine_similarity(query, X_normalized)[0]

# Find top 5 most similar (excluding itself)
top_indices = np.argsort(similarities)[::-1][1:6]

print(f"\nMost similar to sample 0 ({iris.target_names[y[0]]}):")
for idx in top_indices:
    print(f"  Sample {idx}: similarity={similarities[idx]:.4f}, class={iris.target_names[y[idx]]}")

# Compute pairwise Euclidean distances
distances = euclidean_distances(X_normalized)
print(f"\nDistance matrix shape: {distances.shape}")
print(f"Distance from sample 0 to sample 1: {distances[0, 1]:.4f}")

# Visualize high-dimensional vectors using PCA
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

pca = PCA(n_components=2)
X_2d = pca.fit_transform(X_normalized)

plt.figure(figsize=(10, 8))
for i, name in enumerate(iris.target_names):
    mask = y == i
    plt.scatter(X_2d[mask, 0], X_2d[mask, 1], label=name, alpha=0.7)

plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%} variance)')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%} variance)')
plt.title('Iris Dataset: 4D vectors projected to 2D')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('iris_vectors.png', dpi=100)
plt.show()

Summary

Key concepts for ML practitioners:

| Concept | ML Application | |---------|----------------| | Vectors | Feature representation, embeddings | | Dot product | Linear models, attention mechanisms | | Norms | Regularization (L1/L2), distance metrics | | Normalization | Feature scaling, stable training | | Cosine similarity | Text similarity, recommendation systems | | Vector spaces | Dimensionality reduction, PCA |

Understanding these vector operations is fundamental to:

Building and understanding neural networks
Implementing similarity search and clustering
Working with embeddings and representations
Debugging and optimizing ML models

1.2 Matrices and Matrix Operations Beginner

Matrices and Matrix Operations

Matrices are two-dimensional arrays of numbers that form the backbone of machine learning computations. From transforming data to representing neural network layers, matrices enable the efficient parallel computations that make modern ML possible.

What is a Matrix?

A matrix is a rectangular array of numbers arranged in rows and columns:

\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}

A matrix with $$m$$ rows and $$n$$ columns is called an $m \times n$ matrix. The element in row $$i$$ and column $$j$$ is denoted $a_{ij}$ .

Creating Matrices in NumPy

PYTHON

import numpy as np

# Creating matrices
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print(f"Matrix A:\n{A}")
print(f"Shape: {A.shape}")  # (2, 3) - 2 rows, 3 columns
print(f"Element at row 0, col 1: {A[0, 1]}")  # 2

# Special matrices
zeros = np.zeros((3, 4))      # 3x4 matrix of zeros
ones = np.ones((2, 3))        # 2x3 matrix of ones
identity = np.eye(4)          # 4x4 identity matrix
random_mat = np.random.randn(3, 3)  # 3x3 random matrix

print(f"\nIdentity matrix:\n{identity}")

# Reshaping vectors to matrices
v = np.arange(12)  # [0, 1, 2, ..., 11]
M = v.reshape(3, 4)  # Reshape to 3x4
print(f"\nReshaped to 3x4:\n{M}")

M2 = v.reshape(4, 3)  # Reshape to 4x3
print(f"\nReshaped to 4x3:\n{M2}")

Matrix Arithmetic

Matrices support element-wise operations and special matrix operations:

PYTHON

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Element-wise addition
print(f"A + B:\n{A + B}")

# Element-wise multiplication (Hadamard product)
print(f"\nA * B (element-wise):\n{A * B}")

# Scalar operations
print(f"\n2 * A:\n{2 * A}")
print(f"\nA + 10:\n{A + 10}")

# Element-wise functions
print(f"\nnp.sqrt(A):\n{np.sqrt(A)}")
print(f"\nnp.exp(A):\n{np.exp(A)}")

Matrix Multiplication

Matrix multiplication is the core operation in ML. For matrices $\mathbf{A}$ (size $m \times n$ ) and $\mathbf{B}$ (size $n \times p$ ), the product $\mathbf{C} = \mathbf{AB}$ has size $m \times p$ :

c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}

Each element is the dot product of a row from $\mathbf{A}$ with a column from $\mathbf{B}$ .

PYTHON

import numpy as np

A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])  # 2x3

B = np.array([
    [7, 8],
    [9, 10],
    [11, 12]
])  # 3x2

# Matrix multiplication - three equivalent ways
C1 = np.dot(A, B)
C2 = A @ B
C3 = np.matmul(A, B)

print(f"A shape: {A.shape}")  # (2, 3)
print(f"B shape: {B.shape}")  # (3, 2)
print(f"C shape: {C1.shape}")  # (2, 2)

print(f"\nA @ B:\n{C1}")
# [[58, 64],
#  [139, 154]]

# Verify: C[0,0] = A[0,:] · B[:,0] = 1*7 + 2*9 + 3*11 = 58
manual = A[0, 0]*B[0, 0] + A[0, 1]*B[1, 0] + A[0, 2]*B[2, 0]
print(f"\nManual calculation of C[0,0]: {manual}")  # 58

# Matrix multiplication is NOT commutative
# A @ B != B @ A (and dimensions may not even match)

ML Application: Linear Layer

Neural network layers are matrix multiplications:

PYTHON

import numpy as np

def linear_layer(X, W, b):
    """
    Linear transformation: Y = XW + b

    X: input data (batch_size, input_features)
    W: weights (input_features, output_features)
    b: bias (output_features,)
    """
    return X @ W + b

# Example: batch of 4 samples, 3 input features, 2 output features
batch_size = 4
input_features = 3
output_features = 2

X = np.random.randn(batch_size, input_features)
W = np.random.randn(input_features, output_features)
b = np.random.randn(output_features)

Y = linear_layer(X, W, b)

print(f"Input shape: {X.shape}")    # (4, 3)
print(f"Weights shape: {W.shape}")  # (3, 2)
print(f"Bias shape: {b.shape}")     # (2,)
print(f"Output shape: {Y.shape}")   # (4, 2)

print(f"\nInput X:\n{X}")
print(f"\nOutput Y:\n{Y}")

Matrix Transpose

The transpose flips a matrix over its diagonal, swapping rows and columns:

(\mathbf{A}^T)_{ij} = \mathbf{A}_{ji}

PYTHON

import numpy as np

A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print(f"A:\n{A}")
print(f"A shape: {A.shape}")  # (2, 3)

print(f"\nA transpose:\n{A.T}")
print(f"A.T shape: {A.T.shape}")  # (3, 2)

# Properties of transpose
B = np.random.randn(3, 2)
print(f"\n(A @ B).T equals B.T @ A.T: {np.allclose((A @ B).T, B.T @ A.T)}")

# Symmetric matrix: A = A.T
S = np.array([
    [1, 2, 3],
    [2, 4, 5],
    [3, 5, 6]
])
print(f"\nS is symmetric: {np.allclose(S, S.T)}")

Special Matrices

Several special matrices are fundamental to ML:

PYTHON

import numpy as np

n = 4

# Identity matrix: AI = IA = A
I = np.eye(n)
print(f"Identity matrix:\n{I}")

A = np.random.randn(n, n)
print(f"\nA @ I equals A: {np.allclose(A @ I, A)}")

# Diagonal matrix
d = np.array([1, 2, 3, 4])
D = np.diag(d)
print(f"\nDiagonal matrix:\n{D}")

# Extract diagonal from matrix
diag_of_A = np.diag(A)
print(f"\nDiagonal of A: {diag_of_A}")

# Symmetric matrix (A = A.T)
# Create symmetric matrix from any matrix
S = (A + A.T) / 2
print(f"\nS is symmetric: {np.allclose(S, S.T)}")

# Orthogonal matrix (Q.T @ Q = I)
Q, _ = np.linalg.qr(np.random.randn(n, n))
print(f"\nQ.T @ Q equals I: {np.allclose(Q.T @ Q, I)}")

Matrix Inverse

For a square matrix $\mathbf{A}$ , the inverse $\mathbf{A}^{-1}$ satisfies:

\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}

Not all matrices are invertible. A matrix is invertible if its determinant is non-zero.

PYTHON

import numpy as np

A = np.array([
    [4, 7],
    [2, 6]
])

# Compute inverse
A_inv = np.linalg.inv(A)

print(f"A:\n{A}")
print(f"\nA inverse:\n{A_inv}")

# Verify: A @ A_inv = I
product = A @ A_inv
print(f"\nA @ A_inv:\n{product}")
print(f"Is identity: {np.allclose(product, np.eye(2))}")

# Determinant
det = np.linalg.det(A)
print(f"\nDeterminant of A: {det}")  # Should be non-zero for invertible matrix

# Singular (non-invertible) matrix example
singular = np.array([
    [1, 2],
    [2, 4]  # Row 2 = 2 * Row 1
])
print(f"\nDeterminant of singular matrix: {np.linalg.det(singular)}")  # ~0

# Pseudo-inverse for non-invertible matrices
pinv = np.linalg.pinv(singular)
print(f"\nPseudo-inverse of singular matrix:\n{pinv}")

Solving Linear Systems

Matrix algebra enables solving systems of linear equations $\mathbf{Ax} = \mathbf{b}$ :

PYTHON

import numpy as np

# System of equations:
# 2x + 3y = 8
# 4x + 5y = 14

A = np.array([
    [2, 3],
    [4, 5]
])
b = np.array([8, 14])

# Method 1: Using inverse (not recommended for numerical stability)
x_inv = np.linalg.inv(A) @ b
print(f"Solution using inverse: {x_inv}")

# Method 2: Using solve (recommended)
x_solve = np.linalg.solve(A, b)
print(f"Solution using solve: {x_solve}")

# Verify solution
print(f"\nVerification A @ x: {A @ x_solve}")  # Should equal b

# Method 3: Least squares (works for overdetermined systems)
x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(f"Solution using lstsq: {x_lstsq}")

Matrix Rank

The rank of a matrix is the number of linearly independent rows (or columns):

PYTHON

import numpy as np

# Full rank matrix (3x3, rank 3)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 10]  # Not a multiple of other rows
])

print(f"A:\n{A}")
print(f"Rank of A: {np.linalg.matrix_rank(A)}")  # 3

# Rank-deficient matrix
B = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [5, 7, 9]  # Row 3 = Row 1 + Row 2
])

print(f"\nB:\n{B}")
print(f"Rank of B: {np.linalg.matrix_rank(B)}")  # 2

# In ML, low-rank matrices are used for compression
# Original image-like matrix
original = np.random.randn(100, 100)

# Low-rank approximation using SVD
U, s, Vt = np.linalg.svd(original)
k = 10  # Keep only top 10 singular values
low_rank = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

print(f"\nOriginal rank: {np.linalg.matrix_rank(original)}")
print(f"Low-rank approximation rank: {np.linalg.matrix_rank(low_rank)}")
print(f"Compression ratio: {100*100 / (100*k + k + k*100):.2f}x")

Broadcasting in Matrix Operations

NumPy's broadcasting enables efficient operations between arrays of different shapes:

PYTHON

import numpy as np

# Add bias to each row
X = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
bias = np.array([10, 20, 30])

# Broadcasting adds bias to each row
result = X + bias
print(f"X + bias:\n{result}")

# Normalize each column (subtract mean, divide by std)
col_mean = X.mean(axis=0)  # Mean of each column
col_std = X.std(axis=0)    # Std of each column
normalized = (X - col_mean) / col_std

print(f"\nColumn means: {col_mean}")
print(f"Column stds: {col_std}")
print(f"\nNormalized:\n{normalized}")
print(f"Normalized column means: {normalized.mean(axis=0)}")  # ~0
print(f"Normalized column stds: {normalized.std(axis=0)}")    # ~1

# Outer product via broadcasting
a = np.array([1, 2, 3])
b = np.array([4, 5])
outer = a[:, np.newaxis] * b[np.newaxis, :]
print(f"\nOuter product of {a} and {b}:\n{outer}")

ML Application: Batch Processing

Real ML systems process data in batches using matrix operations:

PYTHON

import numpy as np
np.random.seed(42)

class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights with small random values
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros(hidden_size)
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros(output_size)

    def relu(self, x):
        return np.maximum(0, x)

    def softmax(self, x):
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / exp_x.sum(axis=1, keepdims=True)

    def forward(self, X):
        # First layer: linear + ReLU
        self.z1 = X @ self.W1 + self.b1        # (batch, hidden)
        self.a1 = self.relu(self.z1)            # (batch, hidden)

        # Second layer: linear + softmax
        self.z2 = self.a1 @ self.W2 + self.b2  # (batch, output)
        self.a2 = self.softmax(self.z2)        # (batch, output)

        return self.a2

# Create network
net = SimpleNeuralNetwork(input_size=784, hidden_size=128, output_size=10)

# Process a batch of 32 images (28x28 flattened to 784)
batch = np.random.randn(32, 784)
output = net.forward(batch)

print(f"Input shape: {batch.shape}")          # (32, 784)
print(f"Hidden layer shape: {net.a1.shape}")  # (32, 128)
print(f"Output shape: {output.shape}")        # (32, 10)
print(f"\nFirst sample probabilities: {output[0]}")
print(f"Sum of probabilities: {output[0].sum():.4f}")  # 1.0
print(f"Predicted class: {np.argmax(output[0])}")

Matrix Decompositions

Matrix decompositions are essential tools in ML:

PYTHON

import numpy as np

A = np.array([
    [4, 12, -16],
    [12, 37, -43],
    [-16, -43, 98]
], dtype=float)

# Cholesky decomposition (for positive definite matrices)
# A = L @ L.T
L = np.linalg.cholesky(A)
print(f"Cholesky L:\n{L}")
print(f"L @ L.T:\n{L @ L.T}")
print(f"Equals A: {np.allclose(A, L @ L.T)}")

# QR decomposition (any matrix)
# A = Q @ R, where Q is orthogonal, R is upper triangular
B = np.random.randn(4, 3)
Q, R = np.linalg.qr(B)
print(f"\nQ shape: {Q.shape}, R shape: {R.shape}")
print(f"Q @ R equals B: {np.allclose(B, Q @ R)}")
print(f"Q is orthogonal: {np.allclose(Q.T @ Q, np.eye(3))}")

# Singular Value Decomposition (SVD)
# A = U @ S @ V.T
C = np.random.randn(4, 3)
U, s, Vt = np.linalg.svd(C)
print(f"\nSVD shapes: U={U.shape}, s={s.shape}, Vt={Vt.shape}")

# Reconstruct
S = np.zeros((4, 3))
np.fill_diagonal(S, s)
reconstructed = U @ S @ Vt
print(f"Reconstruction error: {np.linalg.norm(C - reconstructed)}")

ML Application: PCA with SVD

Principal Component Analysis uses SVD for dimensionality reduction:

PYTHON

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Center the data (subtract mean)
X_centered = X - X.mean(axis=0)

# Compute SVD
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)

# Project to 2D (first 2 principal components)
X_2d = X_centered @ Vt.T[:, :2]

# Explained variance
explained_variance = (s ** 2) / (s ** 2).sum()
print(f"Explained variance ratios: {explained_variance}")
print(f"First 2 components explain: {explained_variance[:2].sum():.2%}")

# Visualize
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
for i, name in enumerate(iris.target_names):
    mask = y == i
    plt.scatter(X_2d[mask, 0], X_2d[mask, 1], label=name, alpha=0.7)
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.title('PCA of Iris Dataset')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.bar(range(1, 5), explained_variance)
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.title('Scree Plot')
plt.xticks(range(1, 5))

plt.tight_layout()
plt.savefig('pca_example.png', dpi=100)
plt.show()

Summary

Key matrix concepts for ML practitioners:

| Operation | Notation | NumPy | ML Use | |-----------|----------|-------|--------| | Matrix multiply | $\mathbf{AB}$ | A @ B | Layer forward pass | | Transpose | $\mathbf{A}^T$ | A.T | Gradient computation | | Inverse | $\mathbf{A}^{-1}$ | np.linalg.inv(A) | Solving systems | | Determinant | $\det(\mathbf{A})$ | np.linalg.det(A) | Check invertibility | | Rank | $\text{rank}(\mathbf{A})$ | np.linalg.matrix_rank(A) | Dimensionality | | SVD | $\mathbf{U\Sigma V}^T$ | np.linalg.svd(A) | PCA, compression |

Understanding matrices is essential for:

Implementing neural network layers
Understanding backpropagation
Working with covariance matrices
Dimensionality reduction techniques
Efficient batch processing

1.3 Systems of Linear Equations Enthusiast

Systems of Linear Equations

Systems of linear equations appear throughout machine learning: from solving for optimal weights in linear regression to understanding how neural networks learn. Mastering these systems provides the mathematical foundation for understanding optimization algorithms.

What is a Linear System?

A system of linear equations is a collection of equations that must all be satisfied simultaneously:

\begin{align} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n &= b_1 \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n &= b_2 \\ &\vdots \\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n &= b_m \end{align}

This can be written compactly in matrix form:

\mathbf{Ax} = \mathbf{b}

where $\mathbf{A}$ is an $m \times n$ coefficient matrix, $\mathbf{x}$ is the vector of unknowns, and $\mathbf{b}$ is the vector of constants.

A Simple Example

PYTHON

import numpy as np

# System of equations:
# 2x + 3y = 8
# 4x - y = 2

# Matrix form: Ax = b
A = np.array([
    [2, 3],
    [4, -1]
])
b = np.array([8, 2])

# Solve the system
x = np.linalg.solve(A, b)

print(f"Solution: x = {x[0]:.4f}, y = {x[1]:.4f}")

# Verify the solution
print(f"\nVerification:")
print(f"2({x[0]:.4f}) + 3({x[1]:.4f}) = {2*x[0] + 3*x[1]:.4f}")  # Should be 8
print(f"4({x[0]:.4f}) - ({x[1]:.4f}) = {4*x[0] - x[1]:.4f}")     # Should be 2

Three Types of Solutions

Linear systems can have:

Unique solution: Lines/planes intersect at exactly one point
No solution: Lines/planes are parallel (inconsistent system)
Infinitely many solutions: Lines/planes overlap (underdetermined system)

PYTHON

import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

x_range = np.linspace(-5, 5, 100)

# Case 1: Unique solution
ax1 = axes[0]
# 2x + y = 4  =>  y = 4 - 2x
# x - y = 1   =>  y = x - 1
y1 = 4 - 2*x_range
y2 = x_range - 1
ax1.plot(x_range, y1, 'b-', label='2x + y = 4')
ax1.plot(x_range, y2, 'r-', label='x - y = 1')
ax1.plot(5/3, 2/3, 'go', markersize=10, label=f'Solution: ({5/3:.2f}, {2/3:.2f})')
ax1.set_xlim(-2, 4)
ax1.set_ylim(-3, 5)
ax1.set_title('Unique Solution')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_xlabel('x')
ax1.set_ylabel('y')

# Case 2: No solution (parallel lines)
ax2 = axes[1]
# 2x + y = 4  =>  y = 4 - 2x
# 2x + y = 1  =>  y = 1 - 2x
y1 = 4 - 2*x_range
y2 = 1 - 2*x_range
ax2.plot(x_range, y1, 'b-', label='2x + y = 4')
ax2.plot(x_range, y2, 'r-', label='2x + y = 1')
ax2.set_xlim(-2, 4)
ax2.set_ylim(-3, 5)
ax2.set_title('No Solution (Parallel)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_xlabel('x')
ax2.set_ylabel('y')

# Case 3: Infinite solutions (same line)
ax3 = axes[2]
# 2x + y = 4  =>  y = 4 - 2x
# 4x + 2y = 8 =>  y = 4 - 2x (same line)
y1 = 4 - 2*x_range
ax3.plot(x_range, y1, 'b-', linewidth=3, label='2x + y = 4')
ax3.plot(x_range, y1, 'r--', linewidth=1, label='4x + 2y = 8')
ax3.set_xlim(-2, 4)
ax3.set_ylim(-3, 5)
ax3.set_title('Infinite Solutions (Same Line)')
ax3.legend()
ax3.grid(True, alpha=0.3)
ax3.set_xlabel('x')
ax3.set_ylabel('y')

plt.tight_layout()
plt.savefig('linear_systems.png', dpi=100)
plt.show()

Checking Solution Type

PYTHON

import numpy as np

def analyze_system(A, b):
    """Analyze a linear system Ax = b"""
    m, n = A.shape

    # Augmented matrix [A|b]
    augmented = np.column_stack([A, b])

    rank_A = np.linalg.matrix_rank(A)
    rank_aug = np.linalg.matrix_rank(augmented)

    print(f"Matrix A shape: {m}x{n}")
    print(f"Rank of A: {rank_A}")
    print(f"Rank of [A|b]: {rank_aug}")

    if rank_A != rank_aug:
        print("Result: NO SOLUTION (inconsistent system)")
        return None
    elif rank_A == n:
        print("Result: UNIQUE SOLUTION")
        return np.linalg.solve(A, b)
    else:
        print(f"Result: INFINITELY MANY SOLUTIONS")
        print(f"  Free variables: {n - rank_A}")
        return np.linalg.lstsq(A, b, rcond=None)[0]

# Test cases
print("=== Case 1: Unique Solution ===")
A1 = np.array([[2, 3], [4, -1]])
b1 = np.array([8, 2])
x1 = analyze_system(A1, b1)
if x1 is not None:
    print(f"Solution: {x1}")

print("\n=== Case 2: No Solution ===")
A2 = np.array([[2, 1], [2, 1]])
b2 = np.array([4, 1])
analyze_system(A2, b2)

print("\n=== Case 3: Infinite Solutions ===")
A3 = np.array([[2, 1], [4, 2]])
b3 = np.array([4, 8])
x3 = analyze_system(A3, b3)
if x3 is not None:
    print(f"One solution: {x3}")

Gaussian Elimination

Gaussian elimination is the standard algorithm for solving linear systems. It transforms the system into row echelon form through elementary row operations:

PYTHON

import numpy as np

def gaussian_elimination(A, b, verbose=True):
    """
    Solve Ax = b using Gaussian elimination with partial pivoting
    """
    n = len(b)
    # Create augmented matrix
    M = np.column_stack([A.astype(float), b.astype(float)])

    if verbose:
        print("Initial augmented matrix:")
        print(M)

    # Forward elimination
    for col in range(n):
        # Partial pivoting: find max element in column
        max_row = col + np.argmax(np.abs(M[col:, col]))
        M[[col, max_row]] = M[[max_row, col]]

        if verbose and max_row != col:
            print(f"\nSwapped rows {col} and {max_row}")

        # Check for zero pivot
        if np.abs(M[col, col]) < 1e-10:
            raise ValueError("Matrix is singular or nearly singular")

        # Eliminate below
        for row in range(col + 1, n):
            factor = M[row, col] / M[col, col]
            M[row, col:] -= factor * M[col, col:]

        if verbose:
            print(f"\nAfter eliminating column {col}:")
            print(M)

    # Back substitution
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (M[i, -1] - np.dot(M[i, i+1:n], x[i+1:])) / M[i, i]

    return x

# Example
A = np.array([
    [2, 1, -1],
    [-3, -1, 2],
    [-2, 1, 2]
])
b = np.array([8, -11, -3])

print("Solving with Gaussian elimination:\n")
x = gaussian_elimination(A, b)
print(f"\nSolution: {x}")

# Verify
print(f"\nVerification A @ x = {A @ x}")
print(f"Original b = {b}")

LU Decomposition

For repeatedly solving systems with the same coefficient matrix but different right-hand sides, LU decomposition is more efficient:

\mathbf{A} = \mathbf{LU}

where $\mathbf{L}$ is lower triangular and $\mathbf{U}$ is upper triangular.

PYTHON

import numpy as np
from scipy import linalg

A = np.array([
    [2, 1, -1],
    [-3, -1, 2],
    [-2, 1, 2]
], dtype=float)

# Compute LU decomposition with pivoting
P, L, U = linalg.lu(A)

print("A = P @ L @ U")
print(f"\nP (permutation matrix):\n{P}")
print(f"\nL (lower triangular):\n{L}")
print(f"\nU (upper triangular):\n{U}")

# Verify decomposition
print(f"\nP @ L @ U:\n{P @ L @ U}")
print(f"Equals A: {np.allclose(A, P @ L @ U)}")

# Solve multiple systems efficiently
b1 = np.array([8, -11, -3])
b2 = np.array([1, 2, 3])
b3 = np.array([5, 5, 5])

# LU factorization object for efficient solving
lu_factorization = linalg.lu_factor(A)

for i, b in enumerate([b1, b2, b3], 1):
    x = linalg.lu_solve(lu_factorization, b)
    print(f"\nSystem {i}: b = {b}")
    print(f"Solution x = {x}")
    print(f"Verification: A @ x = {A @ x}")

ML Application: Linear Regression

Linear regression finds weights that minimize the sum of squared errors. The normal equations give us a linear system:

\mathbf{X}^T\mathbf{X}\mathbf{w} = \mathbf{X}^T\mathbf{y}

PYTHON

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(42)
n_samples = 100
X_raw = np.random.randn(n_samples, 1) * 2
y = 3 * X_raw.flatten() + 2 + np.random.randn(n_samples) * 0.5

# Add bias column (ones)
X = np.column_stack([np.ones(n_samples), X_raw])

print(f"X shape: {X.shape}")  # (100, 2) - includes bias
print(f"y shape: {y.shape}")  # (100,)

# Method 1: Solve normal equations directly
# X.T @ X @ w = X.T @ y
XtX = X.T @ X
Xty = X.T @ y
w_direct = np.linalg.solve(XtX, Xty)
print(f"\nWeights (direct solve): bias={w_direct[0]:.4f}, slope={w_direct[1]:.4f}")

# Method 2: Using pseudo-inverse (more numerically stable)
w_pinv = np.linalg.pinv(X) @ y
print(f"Weights (pseudo-inverse): bias={w_pinv[0]:.4f}, slope={w_pinv[1]:.4f}")

# Method 3: Using lstsq (recommended)
w_lstsq, residuals, rank, s = np.linalg.lstsq(X, y, rcond=None)
print(f"Weights (lstsq): bias={w_lstsq[0]:.4f}, slope={w_lstsq[1]:.4f}")

# True values: bias=2, slope=3

# Visualize
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_raw, y, alpha=0.5, label='Data')
x_line = np.linspace(-5, 5, 100)
y_line = w_lstsq[0] + w_lstsq[1] * x_line
plt.plot(x_line, y_line, 'r-', linewidth=2, label=f'Fit: y = {w_lstsq[0]:.2f} + {w_lstsq[1]:.2f}x')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Linear Regression via Normal Equations')
plt.legend()
plt.grid(True, alpha=0.3)

# Residual analysis
plt.subplot(1, 2, 2)
y_pred = X @ w_lstsq
residuals = y - y_pred
plt.scatter(y_pred, residuals, alpha=0.5)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Predicted y')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('linear_regression.png', dpi=100)
plt.show()

Overdetermined Systems

When we have more equations than unknowns ( $$m > n$$ ), the system is usually inconsistent. We find the least squares solution that minimizes $\|\mathbf{Ax} - \mathbf{b}\|^2$ :

PYTHON

import numpy as np

# 5 equations, 2 unknowns (overdetermined)
A = np.array([
    [1, 1],
    [1, 2],
    [1, 3],
    [1, 4],
    [1, 5]
])
b = np.array([2.1, 2.9, 4.2, 4.8, 6.1])

print(f"System: {A.shape[0]} equations, {A.shape[1]} unknowns")

# Least squares solution
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

print(f"\nLeast squares solution: {x}")
print(f"Residual sum of squares: {residuals[0]:.6f}")

# Predictions and residuals
predictions = A @ x
residual_vector = b - predictions

print(f"\nOriginal b:    {b}")
print(f"Predicted:     {predictions}")
print(f"Residuals:     {residual_vector}")
print(f"L2 norm of residuals: {np.linalg.norm(residual_vector):.6f}")

Underdetermined Systems

When we have fewer equations than unknowns ( $$m < n$$ ), there are infinitely many solutions. The minimum norm solution minimizes $\|\mathbf{x}\|$ :

PYTHON

import numpy as np

# 2 equations, 4 unknowns (underdetermined)
A = np.array([
    [1, 2, 1, 1],
    [1, 1, 2, 1]
])
b = np.array([5, 5])

print(f"System: {A.shape[0]} equations, {A.shape[1]} unknowns")

# Minimum norm solution using pseudo-inverse
x_min_norm = np.linalg.pinv(A) @ b

print(f"\nMinimum norm solution: {x_min_norm}")
print(f"L2 norm of solution: {np.linalg.norm(x_min_norm):.6f}")

# Verify it satisfies the equations
print(f"\nVerification A @ x = {A @ x_min_norm}")
print(f"Original b = {b}")

# There are infinitely many solutions
# Any x + null_space_vector is also a solution
# Find null space
from scipy.linalg import null_space
N = null_space(A)
print(f"\nNull space dimension: {N.shape[1]}")
print(f"Null space basis vectors:\n{N}")

# Another valid solution
t = 1.0  # Any scalar
x_another = x_min_norm + t * N[:, 0]
print(f"\nAnother solution: {x_another}")
print(f"Verification: A @ x = {A @ x_another}")
print(f"L2 norm: {np.linalg.norm(x_another):.6f} (larger than minimum)")

Ill-Conditioned Systems

Some systems are sensitive to small perturbations. The condition number measures this sensitivity:

PYTHON

import numpy as np

# Well-conditioned system
A_good = np.array([
    [1, 0],
    [0, 1]
])

# Ill-conditioned system (Hilbert matrix)
n = 5
A_bad = np.array([[1/(i+j+1) for j in range(n)] for i in range(n)])

print("=== Well-conditioned matrix ===")
print(f"Condition number: {np.linalg.cond(A_good):.2f}")

print("\n=== Ill-conditioned matrix (Hilbert) ===")
print(f"Condition number: {np.linalg.cond(A_bad):.2e}")

# Demonstrate sensitivity
b_true = np.ones(n)
x_true = np.linalg.solve(A_bad, b_true)

# Add small perturbation
perturbation = 1e-10 * np.random.randn(n)
b_perturbed = b_true + perturbation
x_perturbed = np.linalg.solve(A_bad, b_perturbed)

print(f"\nPerturbation magnitude: {np.linalg.norm(perturbation):.2e}")
print(f"Solution change: {np.linalg.norm(x_true - x_perturbed):.2e}")
print(f"Amplification factor: {np.linalg.norm(x_true - x_perturbed) / np.linalg.norm(perturbation):.2e}")

# Regularization helps with ill-conditioned systems
# Ridge regression: (A.T @ A + lambda * I) @ x = A.T @ b
lambda_reg = 1e-6
A_reg = A_bad.T @ A_bad + lambda_reg * np.eye(n)
b_reg = A_bad.T @ b_true
x_regularized = np.linalg.solve(A_reg, b_reg)
print(f"\nRegularized condition number: {np.linalg.cond(A_reg):.2e}")

Iterative Methods

For large sparse systems, iterative methods are more efficient than direct methods:

PYTHON

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import cg, gmres

# Create a sparse system (e.g., from discretizing a PDE)
n = 1000
# Tridiagonal matrix (sparse)
diag = 4 * np.ones(n)
off_diag = -np.ones(n-1)
A_sparse = np.diag(diag) + np.diag(off_diag, k=1) + np.diag(off_diag, k=-1)
A_csr = csr_matrix(A_sparse)

b = np.ones(n)

print(f"System size: {n}x{n}")
print(f"Non-zero elements: {A_csr.nnz} ({100*A_csr.nnz/n**2:.2f}%)")

# Conjugate Gradient (for symmetric positive definite)
x_cg, info_cg = cg(A_csr, b, tol=1e-10)
print(f"\nCG converged: {info_cg == 0}")
print(f"Residual norm: {np.linalg.norm(A_sparse @ x_cg - b):.2e}")

# GMRES (for general matrices)
x_gmres, info_gmres = gmres(A_csr, b, tol=1e-10)
print(f"\nGMRES converged: {info_gmres == 0}")
print(f"Residual norm: {np.linalg.norm(A_sparse @ x_gmres - b):.2e}")

# Compare with direct solve (slower for large sparse systems)
import time

start = time.time()
x_direct = np.linalg.solve(A_sparse, b)
time_direct = time.time() - start

start = time.time()
x_cg, _ = cg(A_csr, b)
time_cg = time.time() - start

print(f"\nDirect solve time: {time_direct:.4f}s")
print(f"CG solve time: {time_cg:.4f}s")
print(f"Speedup: {time_direct/time_cg:.1f}x")

Summary

Key concepts for solving linear systems in ML:

| Method | When to Use | NumPy Function | |--------|-------------|----------------| | Direct solve | Small, well-conditioned systems | np.linalg.solve() | | LU decomposition | Multiple right-hand sides | scipy.linalg.lu_factor() | | Least squares | Overdetermined systems | np.linalg.lstsq() | | Pseudo-inverse | Under/overdetermined | np.linalg.pinv() | | Iterative (CG, GMRES) | Large sparse systems | scipy.sparse.linalg |

Understanding linear systems is essential for:

Implementing linear regression from scratch
Understanding regularization techniques
Working with optimization algorithms
Debugging numerical issues in ML models

1.4 Eigenvalues and Eigenvectors Enthusiast

Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors reveal the fundamental structure of linear transformations. In machine learning, they power dimensionality reduction (PCA), enable spectral clustering, measure dataset properties, and help analyze neural network dynamics.

The Eigenvalue Problem

For a square matrix $\mathbf{A}$ , an eigenvector $\mathbf{v}$ and eigenvalue $\lambda$ satisfy:

\mathbf{Av} = \lambda\mathbf{v}

This means the matrix $\mathbf{A}$ only stretches (or shrinks) the eigenvector by the factor $\lambda$ , without changing its direction.

PYTHON

import numpy as np

# A simple 2x2 matrix
A = np.array([
    [4, 2],
    [1, 3]
])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Matrix A:")
print(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"\nEigenvectors (as columns):\n{eigenvectors}")

# Verify: A @ v = lambda * v
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lam = eigenvalues[i]
    Av = A @ v
    lambda_v = lam * v
    print(f"\nEigenpair {i+1}:")
    print(f"  λ = {lam:.4f}")
    print(f"  v = {v}")
    print(f"  A @ v = {Av}")
    print(f"  λ * v = {lambda_v}")
    print(f"  Equal: {np.allclose(Av, lambda_v)}")

Geometric Interpretation

Eigenvectors point in directions that are only scaled (not rotated) by the transformation:

PYTHON

import numpy as np
import matplotlib.pyplot as plt

A = np.array([
    [2, 1],
    [1, 2]
])

eigenvalues, eigenvectors = np.linalg.eig(A)

# Create a grid of vectors
theta = np.linspace(0, 2*np.pi, 12, endpoint=False)
vectors = np.array([[np.cos(t), np.sin(t)] for t in theta])

# Transform vectors
transformed = (A @ vectors.T).T

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Original vectors
ax1 = axes[0]
for v in vectors:
    ax1.arrow(0, 0, v[0], v[1], head_width=0.1, head_length=0.05, fc='blue', ec='blue', alpha=0.5)
# Show eigenvectors
for i in range(2):
    v = eigenvectors[:, i]
    ax1.arrow(0, 0, v[0], v[1], head_width=0.15, head_length=0.08, fc='red', ec='red', linewidth=2)
ax1.set_xlim(-2, 2)
ax1.set_ylim(-2, 2)
ax1.set_aspect('equal')
ax1.grid(True, alpha=0.3)
ax1.set_title('Original vectors (eigenvectors in red)')

# Transformed vectors
ax2 = axes[1]
for t in transformed:
    ax2.arrow(0, 0, t[0], t[1], head_width=0.1, head_length=0.05, fc='blue', ec='blue', alpha=0.5)
# Show transformed eigenvectors (same direction, scaled)
for i in range(2):
    v = eigenvectors[:, i] * eigenvalues[i]
    ax2.arrow(0, 0, v[0], v[1], head_width=0.15, head_length=0.08, fc='red', ec='red', linewidth=2)
ax2.set_xlim(-4, 4)
ax2.set_ylim(-4, 4)
ax2.set_aspect('equal')
ax2.grid(True, alpha=0.3)
ax2.set_title(f'Transformed (eigenvalues: {eigenvalues[0]:.1f}, {eigenvalues[1]:.1f})')

plt.tight_layout()
plt.savefig('eigenvectors.png', dpi=100)
plt.show()

print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvector 1: {eigenvectors[:, 0]} -> only scaled by {eigenvalues[0]:.2f}")
print(f"Eigenvector 2: {eigenvectors[:, 1]} -> only scaled by {eigenvalues[1]:.2f}")

Properties of Eigenvalues

PYTHON

import numpy as np

A = np.array([
    [4, 2, 1],
    [2, 5, 3],
    [1, 3, 6]
])

eigenvalues, eigenvectors = np.linalg.eig(A)

print("Matrix A:")
print(A)

# Property 1: Sum of eigenvalues = Trace
trace_A = np.trace(A)
sum_eigenvalues = np.sum(eigenvalues)
print(f"\nTrace of A: {trace_A}")
print(f"Sum of eigenvalues: {sum_eigenvalues.real:.6f}")
print(f"Equal: {np.isclose(trace_A, sum_eigenvalues.real)}")

# Property 2: Product of eigenvalues = Determinant
det_A = np.linalg.det(A)
prod_eigenvalues = np.prod(eigenvalues)
print(f"\nDeterminant of A: {det_A:.6f}")
print(f"Product of eigenvalues: {prod_eigenvalues.real:.6f}")
print(f"Equal: {np.isclose(det_A, prod_eigenvalues.real)}")

# Property 3: Symmetric matrices have real eigenvalues
print(f"\nA is symmetric: {np.allclose(A, A.T)}")
print(f"All eigenvalues are real: {np.allclose(eigenvalues.imag, 0)}")

# Property 4: Symmetric matrices have orthogonal eigenvectors
V = eigenvectors
print(f"\nEigenvectors are orthonormal: {np.allclose(V.T @ V, np.eye(3))}")

Spectral Decomposition

A symmetric matrix can be decomposed as:

\mathbf{A} = \mathbf{V\Lambda V}^T

where $\mathbf{V}$ contains eigenvectors as columns and $\mathbf{\Lambda}$ is diagonal with eigenvalues.

PYTHON

import numpy as np

# Symmetric matrix
A = np.array([
    [4, 2, 1],
    [2, 5, 3],
    [1, 3, 6]
])

eigenvalues, V = np.linalg.eig(A)

# Create diagonal matrix of eigenvalues
Lambda = np.diag(eigenvalues)

# Reconstruct A
A_reconstructed = V @ Lambda @ V.T

print("Original A:")
print(A)

print("\nSpectral decomposition A = V @ Lambda @ V.T:")
print(f"\nV (eigenvectors):\n{V}")
print(f"\nLambda (eigenvalues on diagonal):\n{Lambda}")

print("\nReconstructed A:")
print(A_reconstructed.real)

print(f"\nReconstruction matches: {np.allclose(A, A_reconstructed.real)}")

# This is useful for computing matrix powers
# A^k = V @ Lambda^k @ V.T
k = 5
Lambda_k = np.diag(eigenvalues ** k)
A_k = V @ Lambda_k @ V.T

# Verify
A_k_direct = np.linalg.matrix_power(A, k)
print(f"\nA^{k} via eigendecomposition matches direct: {np.allclose(A_k.real, A_k_direct)}")

ML Application: Principal Component Analysis (PCA)

PCA finds the directions of maximum variance using eigenvalues of the covariance matrix:

PYTHON

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load data
iris = load_iris()
X = iris.data
y = iris.target

print(f"Original data shape: {X.shape}")  # (150, 4)

# Step 1: Center the data
X_centered = X - X.mean(axis=0)

# Step 2: Compute covariance matrix
cov_matrix = np.cov(X_centered.T)
print(f"\nCovariance matrix shape: {cov_matrix.shape}")  # (4, 4)

# Step 3: Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalue (descending)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print(f"\nEigenvalues (variance explained):")
for i, ev in enumerate(eigenvalues):
    print(f"  PC{i+1}: {ev:.4f} ({100*ev/eigenvalues.sum():.1f}%)")

print(f"\nCumulative variance explained:")
cumsum = np.cumsum(eigenvalues) / eigenvalues.sum()
for i, cs in enumerate(cumsum):
    print(f"  First {i+1} components: {100*cs:.1f}%")

# Step 4: Project to 2D
n_components = 2
W = eigenvectors[:, :n_components]
X_pca = X_centered @ W

print(f"\nProjected data shape: {X_pca.shape}")  # (150, 2)

# Visualize
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
for i, name in enumerate(iris.target_names):
    mask = y == i
    plt.scatter(X_pca[mask, 0], X_pca[mask, 1], label=name, alpha=0.7)
plt.xlabel(f'PC1 ({100*eigenvalues[0]/eigenvalues.sum():.1f}% variance)')
plt.ylabel(f'PC2 ({100*eigenvalues[1]/eigenvalues.sum():.1f}% variance)')
plt.title('PCA of Iris Dataset (Manual)')
plt.legend()
plt.grid(True, alpha=0.3)

# Scree plot
plt.subplot(1, 2, 2)
plt.bar(range(1, 5), eigenvalues, alpha=0.7, label='Individual')
plt.plot(range(1, 5), np.cumsum(eigenvalues), 'ro-', label='Cumulative')
plt.xlabel('Principal Component')
plt.ylabel('Eigenvalue (Variance)')
plt.title('Scree Plot')
plt.legend()
plt.xticks(range(1, 5))

plt.tight_layout()
plt.savefig('pca_iris.png', dpi=100)
plt.show()

The Power Method

The power method is an iterative algorithm to find the dominant eigenvalue:

PYTHON

import numpy as np

def power_method(A, num_iterations=100, tol=1e-10):
    """
    Find the largest eigenvalue and corresponding eigenvector
    """
    n = A.shape[0]
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)

    eigenvalue_old = 0

    for i in range(num_iterations):
        # Matrix-vector multiplication
        w = A @ v

        # New eigenvalue estimate (Rayleigh quotient)
        eigenvalue = v @ w

        # Normalize
        v = w / np.linalg.norm(w)

        # Check convergence
        if abs(eigenvalue - eigenvalue_old) < tol:
            print(f"Converged after {i+1} iterations")
            break

        eigenvalue_old = eigenvalue

    return eigenvalue, v

# Test on a symmetric matrix
A = np.array([
    [4, 2, 1],
    [2, 5, 3],
    [1, 3, 6]
])

eigenvalue_power, eigenvector_power = power_method(A)
eigenvalues_np, eigenvectors_np = np.linalg.eig(A)

print(f"\nPower method:")
print(f"  Eigenvalue: {eigenvalue_power:.6f}")
print(f"  Eigenvector: {eigenvector_power}")

print(f"\nNumPy (largest):")
idx = np.argmax(eigenvalues_np)
print(f"  Eigenvalue: {eigenvalues_np[idx].real:.6f}")
print(f"  Eigenvector: {eigenvectors_np[:, idx].real}")

Singular Value Decomposition (SVD)

SVD generalizes eigendecomposition to non-square matrices:

\mathbf{A} = \mathbf{U\Sigma V}^T

PYTHON

import numpy as np
import matplotlib.pyplot as plt

# Any matrix (not necessarily square)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

print(f"Matrix A shape: {A.shape}")  # (4, 3)

# SVD
U, s, Vt = np.linalg.svd(A)

print(f"\nU shape: {U.shape}")   # (4, 4)
print(f"s shape: {s.shape}")     # (3,)
print(f"Vt shape: {Vt.shape}")   # (3, 3)

print(f"\nSingular values: {s}")

# Reconstruct A
Sigma = np.zeros_like(A, dtype=float)
np.fill_diagonal(Sigma, s)
A_reconstructed = U @ Sigma @ Vt

print(f"\nReconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}")

# SVD for low-rank approximation
def low_rank_approx(A, k):
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

# Image compression example
from sklearn.datasets import load_sample_image

# Create a simple grayscale image
np.random.seed(42)
image = np.random.randn(100, 100)
# Add some structure
for i in range(10):
    image += np.outer(np.random.randn(100), np.random.randn(100)) * (10-i)/10

plt.figure(figsize=(15, 4))

ranks = [1, 5, 20, 50, 100]
for i, k in enumerate(ranks):
    if k <= min(image.shape):
        approx = low_rank_approx(image, k)
        error = np.linalg.norm(image - approx) / np.linalg.norm(image)

        plt.subplot(1, len(ranks), i+1)
        plt.imshow(approx, cmap='gray')
        plt.title(f'Rank {k}\nError: {100*error:.1f}%')
        plt.axis('off')

plt.tight_layout()
plt.savefig('svd_compression.png', dpi=100)
plt.show()

Relationship Between SVD and Eigendecomposition

PYTHON

import numpy as np

A = np.random.randn(4, 3)

# SVD
U, s, Vt = np.linalg.svd(A)

# A.T @ A has eigenvalues = singular values squared
AtA = A.T @ A
eigenvalues_AtA, eigenvectors_AtA = np.linalg.eig(AtA)

# Sort eigenvalues
idx = np.argsort(eigenvalues_AtA)[::-1]
eigenvalues_AtA = eigenvalues_AtA[idx]
eigenvectors_AtA = eigenvectors_AtA[:, idx]

print("Singular values squared:", s**2)
print("Eigenvalues of A.T @ A:", eigenvalues_AtA.real)
print("Match:", np.allclose(s**2, eigenvalues_AtA.real))

# A @ A.T has the same non-zero eigenvalues
AAt = A @ A.T
eigenvalues_AAt, _ = np.linalg.eig(AAt)
eigenvalues_AAt = np.sort(eigenvalues_AAt.real)[::-1]

print("\nEigenvalues of A @ A.T:", eigenvalues_AAt[:3])  # First 3 match

ML Application: Matrix Factorization for Recommendations

PYTHON

import numpy as np

# User-item rating matrix (with missing values as 0)
R = np.array([
    [5, 3, 0, 1],
    [4, 0, 0, 1],
    [1, 1, 0, 5],
    [1, 0, 0, 4],
    [0, 1, 5, 4]
], dtype=float)

print("Original ratings (0 = missing):")
print(R)

# SVD-based matrix completion
# Replace 0s with row means for initial approximation
R_filled = R.copy()
for i in range(R.shape[0]):
    row = R[i]
    mean = row[row > 0].mean() if (row > 0).any() else 0
    R_filled[i, row == 0] = mean

# Apply SVD
U, s, Vt = np.linalg.svd(R_filled)

# Keep top k factors
k = 2
U_k = U[:, :k]
s_k = np.diag(s[:k])
Vt_k = Vt[:k, :]

# Reconstructed matrix
R_approx = U_k @ s_k @ Vt_k

print(f"\nApproximation with rank {k}:")
print(np.round(R_approx, 1))

# Predicted ratings for missing entries
print("\nPredicted ratings for missing entries:")
for i in range(R.shape[0]):
    for j in range(R.shape[1]):
        if R[i, j] == 0:
            print(f"  User {i}, Item {j}: {R_approx[i, j]:.2f}")

Spectral Clustering

Spectral clustering uses eigenvectors of the graph Laplacian:

PYTHON

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.cluster import KMeans

# Generate non-linearly separable data
X, y_true = make_moons(n_samples=200, noise=0.05, random_state=42)

# Build similarity graph (Gaussian kernel)
def gaussian_kernel(X, sigma=0.5):
    n = X.shape[0]
    W = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            W[i, j] = np.exp(-np.linalg.norm(X[i] - X[j])**2 / (2*sigma**2))
    return W

W = gaussian_kernel(X)

# Compute Laplacian
D = np.diag(W.sum(axis=1))
L = D - W

# Normalized Laplacian
D_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D)))
L_norm = D_inv_sqrt @ L @ D_inv_sqrt

# Compute eigenvectors
eigenvalues, eigenvectors = np.linalg.eigh(L_norm)

# Use first k eigenvectors (after the trivial one)
k = 2
V = eigenvectors[:, 1:k+1]

# Normalize rows
V_normalized = V / np.linalg.norm(V, axis=1, keepdims=True)

# Cluster in embedding space
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
labels = kmeans.fit_predict(V_normalized)

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].scatter(X[:, 0], X[:, 1], c=y_true, cmap='coolwarm', alpha=0.7)
axes[0].set_title('True Labels')

axes[1].scatter(V_normalized[:, 0], V_normalized[:, 1], c=y_true, cmap='coolwarm', alpha=0.7)
axes[1].set_title('Spectral Embedding')

axes[2].scatter(X[:, 0], X[:, 1], c=labels, cmap='coolwarm', alpha=0.7)
axes[2].set_title('Spectral Clustering Result')

for ax in axes:
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('spectral_clustering.png', dpi=100)
plt.show()

# Accuracy
from sklearn.metrics import adjusted_rand_score
print(f"Adjusted Rand Score: {adjusted_rand_score(y_true, labels):.4f}")

Summary

Key eigenvalue concepts for ML:

| Concept | ML Application | |---------|----------------| | Eigendecomposition | PCA, spectral methods | | Largest eigenvalue | Power method, PageRank | | Singular values | Matrix rank, compression | | SVD | Dimensionality reduction, recommendations | | Graph Laplacian | Spectral clustering | | Condition number | Numerical stability |

Understanding eigenvalues and eigenvectors is essential for:

Implementing PCA from scratch
Understanding how neural networks learn
Working with graph-based algorithms
Analyzing covariance structures
Building recommendation systems

1.5 Singular Value Decomposition and Matrix Factorizations Intermediate

Singular Value Decomposition and Matrix Factorizations

Matrix decompositions are among the most powerful tools in linear algebra for machine learning. They reveal the underlying structure of data, enable dimensionality reduction, and form the mathematical foundation for techniques like Principal Component Analysis (PCA), recommendation systems, and data compression.

Singular Value Decomposition (SVD)

The Singular Value Decomposition factors any $m \times n$ matrix $$A$$ into three matrices:

A = U \Sigma V^T

Where:

is an  orthogonal matrix (left singular vectors)
is an  diagonal matrix (singular values)
is an  orthogonal matrix (right singular vectors)

PYTHON

import numpy as np
import matplotlib.pyplot as plt

# Create a sample matrix
A = np.array([
    [3, 2, 2],
    [2, 3, -2],
    [2, -2, 3],
    [1, 1, 1]
])

# Compute SVD
U, singular_values, Vt = np.linalg.svd(A, full_matrices=True)

print("Original matrix A:")
print(A)
print(f"\nShape of A: {A.shape}")

print("\nU (left singular vectors):")
print(U)
print(f"Shape: {U.shape}")

print("\nSingular values:")
print(singular_values)

print("\nV^T (right singular vectors transposed):")
print(Vt)
print(f"Shape: {Vt.shape}")

# Verify reconstruction
# Create full Sigma matrix
Sigma = np.zeros((A.shape[0], A.shape[1]))
np.fill_diagonal(Sigma, singular_values)

A_reconstructed = U @ Sigma @ Vt
print("\nReconstruction error:", np.linalg.norm(A - A_reconstructed))

Geometric Interpretation of SVD

SVD decomposes any linear transformation into three operations: rotation, scaling, and another rotation.

PYTHON

def visualize_svd_transformation():
    """
    Visualize how SVD decomposes a 2D transformation.
    """
    # 2x2 transformation matrix
    A = np.array([[2, 1],
                  [1, 2]])

    U, s, Vt = np.linalg.svd(A)

    # Create unit circle points
    theta = np.linspace(0, 2*np.pi, 100)
    circle = np.array([np.cos(theta), np.sin(theta)])

    fig, axes = plt.subplots(1, 4, figsize=(16, 4))

    # Step 1: Original unit circle
    axes[0].plot(circle[0], circle[1], 'b-', linewidth=2)
    axes[0].set_title('1. Unit Circle')
    axes[0].set_xlim(-3, 3)
    axes[0].set_ylim(-3, 3)
    axes[0].set_aspect('equal')
    axes[0].grid(True)

    # Step 2: After V^T rotation
    step1 = Vt @ circle
    axes[1].plot(step1[0], step1[1], 'g-', linewidth=2)
    axes[1].set_title('2. After V^T (rotation)')
    axes[1].set_xlim(-3, 3)
    axes[1].set_ylim(-3, 3)
    axes[1].set_aspect('equal')
    axes[1].grid(True)

    # Step 3: After Sigma scaling
    Sigma = np.diag(s)
    step2 = Sigma @ step1
    axes[2].plot(step2[0], step2[1], 'orange', linewidth=2)
    axes[2].set_title(f'3. After Σ (scale by {s[0]:.2f}, {s[1]:.2f})')
    axes[2].set_xlim(-3, 3)
    axes[2].set_ylim(-3, 3)
    axes[2].set_aspect('equal')
    axes[2].grid(True)

    # Step 4: After U rotation (final result)
    step3 = U @ step2
    axes[3].plot(step3[0], step3[1], 'r-', linewidth=2)
    axes[3].set_title('4. After U (final ellipse)')
    axes[3].set_xlim(-3, 3)
    axes[3].set_ylim(-3, 3)
    axes[3].set_aspect('equal')
    axes[3].grid(True)

    plt.tight_layout()
    plt.savefig('svd_geometric.png', dpi=100)
    print("SVD transforms unit circle through: rotation → scaling → rotation")

visualize_svd_transformation()

Low-Rank Approximation

The truncated SVD provides the best low-rank approximation (Eckart-Young theorem):

A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T

PYTHON

def low_rank_approximation(A, k):
    """
    Compute rank-k approximation of matrix A using SVD.

    Args:
        A: Input matrix
        k: Target rank

    Returns:
        Rank-k approximation of A
    """
    U, s, Vt = np.linalg.svd(A, full_matrices=False)

    # Keep only top k components
    U_k = U[:, :k]
    s_k = s[:k]
    Vt_k = Vt[:k, :]

    # Reconstruct
    A_k = U_k @ np.diag(s_k) @ Vt_k

    return A_k

# Example: Image compression
def compress_image_svd(image, k):
    """
    Compress a grayscale image using SVD.
    """
    U, s, Vt = np.linalg.svd(image, full_matrices=False)

    # Truncated SVD
    compressed = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

    # Calculate compression ratio
    original_size = image.shape[0] * image.shape[1]
    compressed_size = k * (image.shape[0] + image.shape[1] + 1)
    ratio = original_size / compressed_size

    return compressed, ratio

# Create a sample image (gradient + noise)
np.random.seed(42)
image = np.outer(np.linspace(0, 1, 100), np.linspace(0, 1, 100))
image += 0.1 * np.random.randn(100, 100)

print("Image compression with SVD:")
for k in [5, 10, 20, 50]:
    compressed, ratio = compress_image_svd(image, k)
    error = np.linalg.norm(image - compressed) / np.linalg.norm(image)
    print(f"  k={k:2d}: compression ratio={ratio:.1f}x, relative error={error:.4f}")

Principal Component Analysis Foundation

PCA is directly derived from SVD of the centered data matrix:

PYTHON

def pca_via_svd(X, n_components):
    """
    Implement PCA using SVD.

    Args:
        X: Data matrix (n_samples, n_features)
        n_components: Number of principal components

    Returns:
        transformed: Projected data
        components: Principal component directions
        explained_variance_ratio: Variance explained by each component
    """
    # Center the data
    X_centered = X - X.mean(axis=0)

    # SVD of centered data
    U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)

    # Principal components are rows of Vt
    components = Vt[:n_components]

    # Project data onto principal components
    transformed = X_centered @ components.T

    # Explained variance
    explained_variance = (s ** 2) / (X.shape[0] - 1)
    explained_variance_ratio = explained_variance / explained_variance.sum()

    return transformed, components, explained_variance_ratio[:n_components]

# Example with synthetic data
np.random.seed(42)
n_samples = 200

# Create correlated 3D data
t = np.linspace(0, 4*np.pi, n_samples)
X = np.column_stack([
    2*np.cos(t) + 0.5*np.random.randn(n_samples),
    np.sin(t) + 0.3*np.random.randn(n_samples),
    0.5*np.cos(t) + 0.5*np.sin(t) + 0.2*np.random.randn(n_samples)
])

# Apply PCA
transformed, components, var_ratio = pca_via_svd(X, n_components=3)

print("PCA via SVD:")
print(f"Original shape: {X.shape}")
print(f"Transformed shape: {transformed.shape}")
print(f"\nExplained variance ratio:")
for i, ratio in enumerate(var_ratio):
    print(f"  PC{i+1}: {ratio:.4f} ({ratio*100:.1f}%)")
print(f"\nTotal variance explained: {var_ratio.sum()*100:.1f}%")

Other Important Matrix Decompositions

LU Decomposition

Factors a matrix into lower and upper triangular matrices:

PYTHON

from scipy.linalg import lu

def lu_decomposition_demo():
    """
    LU decomposition: A = PLU
    Used for solving linear systems efficiently.
    """
    A = np.array([
        [4, 3, 2],
        [2, 1, 3],
        [3, 4, 1]
    ], dtype=float)

    P, L, U = lu(A)

    print("LU Decomposition: A = PLU")
    print("\nOriginal matrix A:")
    print(A)
    print("\nPermutation matrix P:")
    print(P)
    print("\nLower triangular L:")
    print(L)
    print("\nUpper triangular U:")
    print(U)

    # Verify
    print("\nReconstruction P @ L @ U:")
    print(P @ L @ U)
    print(f"\nReconstruction error: {np.linalg.norm(A - P @ L @ U):.2e}")

lu_decomposition_demo()

QR Decomposition

Factors a matrix into orthogonal and upper triangular matrices:

PYTHON

def qr_decomposition_demo():
    """
    QR decomposition: A = QR
    Foundation for QR algorithm (eigenvalue computation)
    and least squares solutions.
    """
    A = np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 10],
        [2, 1, 1]
    ], dtype=float)

    Q, R = np.linalg.qr(A)

    print("QR Decomposition: A = QR")
    print(f"\nOriginal A shape: {A.shape}")
    print("\nOrthogonal matrix Q:")
    print(Q)
    print(f"Q shape: {Q.shape}")

    print("\nUpper triangular R:")
    print(R)
    print(f"R shape: {R.shape}")

    # Verify Q is orthogonal
    print(f"\nQ^T @ Q (should be identity):")
    print(np.round(Q.T @ Q, 10))

    # Verify reconstruction
    print(f"\nReconstruction error: {np.linalg.norm(A - Q @ R):.2e}")

qr_decomposition_demo()

Cholesky Decomposition

For symmetric positive-definite matrices:

PYTHON

def cholesky_decomposition_demo():
    """
    Cholesky decomposition: A = LL^T
    Efficient for symmetric positive-definite matrices.
    Used in Gaussian processes and optimization.
    """
    # Create symmetric positive-definite matrix
    B = np.random.randn(4, 4)
    A = B @ B.T + 0.1 * np.eye(4)  # Ensure positive definite

    L = np.linalg.cholesky(A)

    print("Cholesky Decomposition: A = LL^T")
    print("\nSymmetric positive-definite matrix A:")
    print(A)
    print("\nLower triangular L:")
    print(L)

    # Verify
    print("\nReconstruction L @ L^T:")
    print(L @ L.T)
    print(f"\nReconstruction error: {np.linalg.norm(A - L @ L.T):.2e}")

    # Application: Efficient linear system solving
    # Solve Ax = b using Cholesky
    b = np.array([1, 2, 3, 4])

    # Forward substitution: Ly = b
    y = np.linalg.solve(L, b)
    # Backward substitution: L^T x = y
    x = np.linalg.solve(L.T, y)

    print(f"\nSolving Ax = b:")
    print(f"Solution x: {x}")
    print(f"Verification Ax: {A @ x}")
    print(f"Original b: {b}")

cholesky_decomposition_demo()

Applications in Machine Learning

PYTHON

def ml_applications_summary():
    """
    Summary of matrix decomposition applications in ML.
    """
    applications = {
        "SVD": [
            "Principal Component Analysis (PCA)",
            "Latent Semantic Analysis (LSA) in NLP",
            "Recommender systems (matrix factorization)",
            "Image compression and denoising",
            "Pseudoinverse computation"
        ],
        "Eigendecomposition": [
            "PCA (via covariance matrix)",
            "Spectral clustering",
            "Graph Laplacian methods",
            "Kernel PCA"
        ],
        "QR Decomposition": [
            "Least squares regression (stable solution)",
            "Gram-Schmidt orthogonalization",
            "QR algorithm for eigenvalues"
        ],
        "Cholesky": [
            "Gaussian Process regression",
            "Sampling from multivariate Gaussians",
            "Optimization (Newton's method)",
            "Kalman filtering"
        ],
        "LU Decomposition": [
            "Solving linear systems",
            "Matrix inversion",
            "Determinant computation"
        ]
    }

    print("Matrix Decompositions in Machine Learning")
    print("=" * 50)
    for decomp, apps in applications.items():
        print(f"\n{decomp}:")
        for app in apps:
            print(f"  • {app}")

ml_applications_summary()

Matrix decompositions provide the computational backbone for many machine learning algorithms. Understanding SVD in particular is essential, as it underlies dimensionality reduction, data compression, and reveals the fundamental structure hidden within high-dimensional datasets.