Rogerspy's Home

Rogerspy's Home

Mathematics for Machine Learning

Part I Mathematical Foundations

1. Linear Algebra

1.1 System of Linear Equations

1.2 Matrices

1.3 Solving Systems of Linear Equations

1.4 Vector Spaces

1.5 Linear Independence

1.6 Basis and Rank

1.7 Linear Mappings

1.8 Affine Spaces

1.9 Further Reading

2. Analytic Geometry

2.1 Norms

2.2 Inner Products

2.3 Lengths and Distances

2.4 Angles and Orthogonality

2.5 Orthogonal Basis

2.6 Orthogonal Complement

2.7 Inner Product of Functions

2.8 Rotations

2.9 Further Reading

3. Matrix Decompositions

3.1 Determinant and Trace

3.2 Eigenvalues and Eigenvectors

3.3 Cholesky Decomposition

3.4 Eigendecomposition and Diagonalization

3.5 Singular Value Decomposition

3.6 Matrix Approximation

3.7 Matrix Phylogeny

3.8 Further Reading

4. Vector Calculus

4.1 Differentiation of Univariate Functions

4.2 Partial Differentiation and Gradients

4.3 Gradients of Vector-Valued Functions

4.4 Gradients of Matrices

4.5 Useful Identities for Computing Gradients

4.6 Backpropagation and Automatic Differentiation

4.7 Higher-Order Derivatives

4.8 Linearization and Multivariate Taylor Series

4.9 Further Reading

5. Probability and Distributions

5.1 Construction of a Probability Space

5.2 Discrete and Continuous Probabilities

5.3 Sum Rule, Product Rule, and Bayes’ Theorem

5.4 Summary Statistics and Independence

5.5 Gaussian Distribution

5.6 Conjugacy and the Exponential Family

5.7 Change of Variables/Inverse Transform

5.8 Further Reading

6. Continuous Optimization

6.1 Optimization Using Gradient Descent

6.2 Constrained Optimization and Lagrange Multipliers

6.3 Convex Optimization

6.4 Further Reading

Part II Central Machine Learning Problems

7. When Models Meet Data

7.1 Data, Models, and Learning

7.2 Empirical Risk Minimization

7.3 Parameter Estimation

7.4 Probabilistic Modeling and Inference

7.5 Directed Graphical Models

7.6 Model Selection

8. Linear Regression

8.1 Problem Formulation

8.2 Parameter Estimation

8.3 Bayesian Linear Regression

8.4 Maximum Likelihood as Orthogonal Projection

8.5 Further Reading

9. Dimensionality Reduction with Principal Component Analysis

9.1 Problem Setting

9.2 Maximum Variance Perspective

9.3 Projection Perspective

9.4 Eigenvector Computation and Low-Rank Approximations

9.5 PCA in High Dimensions

9.6 Key Steps of PCA in Practice

9.7 Latent Variable Perspective

9.8 Further Reading

10. Density Estimation with Gaussian Mixture Models

10.1 Gaussian Mixture Model

10.2 Parameter Learning via Maximum Likelihood

10.3 EM Algorithm

10.4 Latent-Variable Perspective

10.5 Further Reading

11. Classification with Support Vector Machines

11.1 Separating Hyperplanes

11.2 Primal Support Vector Machine

11.3 Dual Support Vector Machine

11.4 Kernels

11.5 Numerical Solution

11.6 Further Reading