Krylov subspace iteration
Power method, pagerank, Lanczos decomposition, …
August 5, 2014 — October 16, 2024
Assumed audience:
People with undergrad linear algebra
1 Power iteration
The simplest possible variant.
2 Lanczos decomposition
Lanczos decomposition is a handy approximation for matrices that are cheap to multiply because of some structure, but expensive to store. It can also be used to calculate an approximate inverse cheaply.
I learnt this trick from Gaussian process literature in the context of Lanczos Variance Estimates (LOVE) (Pleiss et al. 2018), although I believe it exists elsewhere.
Given some rank \(k\) and an arbitrary starting vector \(\boldsymbol{b}\), the Lanczos algorithm iteratively approximates \(\mathrm{K} \in\mathbb{R}^{n \times n}\) by a low-rank factorisation \(\mathrm{K}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\), where \(\mathrm{T} \in \mathbb{R}^{k \times k}\) is tridiagonal and \(\mathrm{Q} \in \mathbb{R}^{n \times k}\) has orthogonal columns. Crucially, we do not need to form \(\mathrm{K}\) to evaluate matrix-vector products \(\mathrm{K}\boldsymbol{b}\) for an arbitrary vector \(\boldsymbol{b}\). Moreover, with a given Lanczos approximand \(\mathrm{Q},\mathrm{T}\) we may estimate \[\begin{align*} \mathrm{K}^{-1}\boldsymbol{c}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\boldsymbol{c}. \end{align*}\] even for \(\boldsymbol{b}\neq\boldsymbol{c}\). Say we wish to calculate \(\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)^{-1}\mathrm{B}\), with \(\mathrm{Z}\in\mathbb{R}^{D\times N }\) and \(N\ll D\).
We approximate the solution to this linear system using the partial Lanczos decomposition starting with probe vector \(\boldsymbol{b}=\overline{\mathrm{B}}\) and \(\mathrm{K}=\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)\).
This requires \(k\) matrix-vector products of the form \[\begin{align*} \underbrace{\left(\underbrace{\mathrm{Z} \mathrm{Z}^{\top}}_{\mathcal{O}(ND^2)}+\sigma^2 \mathrm{I}\right)\boldsymbol{b}}_{\mathcal{O}(D^2)} =\underbrace{\mathrm{Z} \underbrace{(\mathrm{Z}^{\top}\boldsymbol{b})}_{\mathcal{O}(ND)}}_{\mathcal{O}(ND)} +\sigma^2 \boldsymbol{b}. \end{align*}\] Using the latter representation, the required matrix-vector product may be found with a time complexity cost of \(\mathcal{O}(ND)\). Space complexity is also \(\mathcal{O}(ND)\). The output of the Lanczos decomposition is \(\mathrm{Q},\mathrm{T}\) such that \(\left(\mathrm{Z}\mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)\boldsymbol{b}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\boldsymbol{b}\). Then the solution to the inverse-matrix-vector product may be approximated by \(\left(\mathrm{Z} \mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)^{-1} \mathrm{B}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\mathrm{B}\). requiring the solution in \(\mathrm{X}\) of the much smaller linear system \(\mathrm{X}\mathrm{T}=\mathrm{Q}\). Exploiting the positive-definiteness of \(\mathrm{T}\) we may use the Cholesky decomposition of \(\mathrm{T}=\mathrm{L}^{\top}\mathrm{L}\) for a constant speedup over solving an arbitrary linear system. The time cost of the solution is \(\mathcal{O}(Dk^3)\), for an overall cost to the matrix inversions of \(\mathcal{O}(NDk+Dk^3)\).
make more precise; we may have missed some multiplies and Lanczos overhead, plus deviation calcs
Lifehack: Find derivatives of Lanczos iterations via pnkraemer/matfree: Matrix-free linear algebra in JAX. (Krämer et al. 2024).