Singular Value Decomposition

The ML workhorse for linear algebra

2014-08-05 — 2025-05-23

feature construction

functional analysis

high d

linear algebra

networks

probability

signal processing

sparser than thou

statistics

Suspiciously similar content

Assumed audience:

People with undergrad linear algebra

An important matrix factorisation. TBC

1 What is SVD?

For any $A \in \R^{m \times n}$ of rank $r$ , the (thin) singular‐value decomposition is

$A = U Σ V^{T},$

where

$U \in \R^{m \times r}$ has orthonormal columns ( $U^{T} U = I_{r}$ ),
$V \in \R^{n \times r}$ has orthonormal columns ( $V^{T} V = I_{r}$ ),
$Σ = diag (σ_{1}, \dots, σ_{r})$ , with $σ_{1} \geq \dots \geq σ_{r} > 0$ .

You can do SO MUCH with this, but rapidly run in to problems of computational cost and numerical stability if you are naive about it.

2 Randomized methods

There are a few mentioned; e.g. everyone name-checks Halko, Martinsson, and Tropp (2010) and Bach et al. (2019). I’ve used Halko, Martinsson, and Tropp (2010) a lot because it is included in PyTorch. I am curious about Allen-Zhu and Li (2017) which claims to be stable at low floating point precision, which is very desirable in these big-network times.

3 Incremental updates / downdates

A rank-1 incremental update corresponds to appending a new column $a \in \R^{m}$ to $A$ : Given current SVD $A = U Σ V^{T}$ , form

$A^{'} = [A a] \in \R^{m \times (n + 1)} .$

We want $A^{'} = U^{'} Σ^{'} {V^{'}}^{T}$ cheaply.

Project $a$ onto $span (U)$ :

$w = U^{T} a (\in \R^{r}),$

and compute the residual

$p = a - U w, α = ∥ p ∥_{2} .$

If $α > 0$ , set $q = p / α$ (else pick any unit $q ⊥ U$ ).
Form the small “core”

$K = [\begin{matrix} Σ & w \\ 0 & α \end{matrix}] \in \R^{(r + 1) \times (r + 1)} .$
Compute SVD of $K$ :

$K = \tilde{U} \tilde{Σ} {\tilde{V}}^{T},$

where $\tilde{U}, \tilde{V} \in \R^{(r + 1) \times (r + 1)}$ .
Update full factors:

$U^{'} = [U q] \tilde{U}, Σ^{'} = \tilde{Σ}, V^{'} = [\begin{matrix} V & 0 \\ 0 & 1 \end{matrix}] \tilde{V} .$

Cost: SVD of one $(r + 1) \times (r + 1)$ matrix instead of $m \times n$ .

There are also downdates (removing a column), which are pretty similar.

For fancy variations, there are many papers (Brand 2006, 2002; Bunch and Nielsen 1978; Gu and Eisenstat 1995, 1993; Sarwar et al. 2002; Zhang 2022).

4 …as Frobenius minimiser

TODO.

5 …for PCA

This is nearly trivial but needs spelling out: SVD gives us the PCA, among other useful things.

Principal Component Analysis

6 Incoming

Carlo Tomasi’s lecture notes are elegantly pedagogic
Avrim Blum on SVD
SVD at Nova Automata

7 References

Allen-Zhu, and Li. 2017. “LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain.”

Bach, Ceglia, Song, et al. 2019. “Randomized Low-Rank Approximation Methods for Projection-Based Model Order Reduction of Large Nonlinear Dynamical Problems.” International Journal for Numerical Methods in Engineering.

Brand. 2002. “Incremental Singular Value Decomposition of Uncertain Data with Missing Values.” In Computer Vision — ECCV 2002.

———. 2006. “Fast Low-Rank Modifications of the Thin Singular Value Decomposition.” Linear Algebra and Its Applications, Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems,.

Bunch, and Nielsen. 1978. “Updating the Singular Value Decomposition.” Numerische Mathematik.

Gu, and Eisenstat. 1993. “A Stable and Fast Algorithm for Updating the Singular Value Decomposition.”

———. 1995. “Downdating the Singular Value Decomposition.” SIAM Journal on Matrix Analysis and Applications.

Halko, Martinsson, and Tropp. 2010. “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions.”

Hastie, Mazumder, Lee, et al. 2015. “Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.” In Journal of Machine Learning Research.

Rabani, and Toledo. 2001. “Out-of-Core SVD and QR Decompositions.” In PPSC.

Saad. 2003. Iterative Methods for Sparse Linear Systems: Second Edition.

Sarwar, Karypis, Konstan, et al. 2002. “Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems.”

Tropp, Yurtsever, Udell, et al. 2016. “Randomized Single-View Algorithms for Low-Rank Matrix Approximation.” arXiv:1609.00048 [Cs, Math, Stat].

Zhang. 2022. “An Answer to an Open Question in the Incremental SVD.”