# Krylov subspace iteration

Power method, pagerank, Lanczos decomposition, …

August 5, 2014 — October 16, 2024

### Assumed audience:

People with undergrad linear algebra

## 1 Power iteration

The simplest possible variant.

## 2 Lanczos decomposition

Lanczos decomposition is a handy approximation for matrices that are cheap to multiply because of some structure, but expensive to store. It can also be used to calculate an approximate inverse cheaply.

I learnt this trick from Gaussian process literature in the context of Lanczos Variance Estimates (LOVE) (Pleiss et al. 2018), although I believe it exists elsewhere.

Given some rank \(k\) and an arbitrary starting vector \(\boldsymbol{b}\), the Lanczos algorithm iteratively approximates \(\mathrm{K} \in\mathbb{R}^{n \times n}\) by a low-rank factorisation \(\mathrm{K}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\), where \(\mathrm{T} \in \mathbb{R}^{k \times k}\) is tridiagonal and \(\mathrm{Q} \in \mathbb{R}^{n \times k}\) has orthogonal columns. Crucially, we do not need to form \(\mathrm{K}\) to evaluate matrix-vector products \(\mathrm{K}\boldsymbol{b}\) for an arbitrary vector \(\boldsymbol{b}\). Moreover, with a given Lanczos approximand \(\mathrm{Q},\mathrm{T}\) we may estimate \[\begin{align*} \mathrm{K}^{-1}\boldsymbol{c}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\boldsymbol{c}. \end{align*}\] even for \(\boldsymbol{b}\neq\boldsymbol{c}\). Say we wish to calculate \(\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)^{-1}\mathrm{B}\), with \(\mathrm{Z}\in\mathbb{R}^{D\times N }\) and \(N\ll D\).

We approximate the solution to this linear system using the partial Lanczos decomposition starting with probe vector \(\boldsymbol{b}=\overline{\mathrm{B}}\) and \(\mathrm{K}=\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)\).

This requires \(k\) matrix-vector products of the form \[\begin{align*} \underbrace{\left(\underbrace{\mathrm{Z} \mathrm{Z}^{\top}}_{\mathcal{O}(ND^2)}+\sigma^2 \mathrm{I}\right)\boldsymbol{b}}_{\mathcal{O}(D^2)} =\underbrace{\mathrm{Z} \underbrace{(\mathrm{Z}^{\top}\boldsymbol{b})}_{\mathcal{O}(ND)}}_{\mathcal{O}(ND)} +\sigma^2 \boldsymbol{b}. \end{align*}\] Using the latter representation, the required matrix-vector product may be found with a time complexity cost of \(\mathcal{O}(ND)\). Space complexity is also \(\mathcal{O}(ND)\). The output of the Lanczos decomposition is \(\mathrm{Q},\mathrm{T}\) such that \(\left(\mathrm{Z}\mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)\boldsymbol{b}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\boldsymbol{b}\). Then the solution to the inverse-matrix-vector product may be approximated by \(\left(\mathrm{Z} \mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)^{-1} \mathrm{B}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\mathrm{B}\). requiring the solution in \(\mathrm{X}\) of the much smaller linear system \(\mathrm{X}\mathrm{T}=\mathrm{Q}\). Exploiting the positive-definiteness of \(\mathrm{T}\) we may use the Cholesky decomposition of \(\mathrm{T}=\mathrm{L}^{\top}\mathrm{L}\) for a constant speedup over solving an arbitrary linear system. The time cost of the solution is \(\mathcal{O}(Dk^3)\), for an overall cost to the matrix inversions of \(\mathcal{O}(NDk+Dk^3)\).

make more precise; we may have missed some multiplies and Lanczos overhead, plus deviation calcs

Lifehack: Find derivatives of Lanczos iterations via pnkraemer/matfree: Matrix-free linear algebra in JAX. (Krämer et al. 2024).

## 3 Incoming

## 4 References

*SIAM Journal on Scientific Computing*.

*Applied Mathematics and Computation*.

*Communications on Pure and Applied Mathematics*.

*SIAM Journal on Scientific Computing*.

*Inverse Problems*.

*Applied Mathematics and Computation*.

*SIAM Journal on Computing*.

*SIAM Journal on Computing*.

*Proceedings of the 32nd International Conference on Neural Information Processing Systems*. NIPS’18.

*Matrices, Moments and Quadrature with Applications*.

*arXiv:1612.06013 [Math]*.

*SIAM Review*.

*SIAM Journal on Scientific Computing*.

*Linear Algebra and Its Applications*.

*Journal of Scientific Computing*.

*arXiv:1606.06511 [Cs, Math]*.

*Randomized Algorithms for Matrices and Data*.

*Proceedings of the 27th International Conference on International Conference on Machine Learning*. ICML’10.

*Proceedings of the 32nd International Conference on Machine Learning*.

*Neural Networks: Tricks of the Trade*. Lecture Notes in Computer Science.

*arXiv:1607.01649 [Math]*.

*arXiv:1503.07157 [Math]*.

*SIAM J. Matrix Anal. Appl.*

*Foundations and Trends® in Theoretical Computer Science*.

*Wiley StatsRef: Statistics Reference Online*.

*Matrix Algebra Useful for Statistics*.

*Proceedings of The 34th International Conference on Algorithmic Learning Theory*.

*SIAM Review*.

*SIAM Journal on Scientific Computing*.

*IEEE Transactions on Signal Processing*.

*SIAM Journal on Scientific Computing*.

*SIAM Journal on Scientific Computing*.

*An Introduction to Data Analysis and Uncertainty Quantification for Inverse Problems*. Mathematics in Industry.

*SIAM Journal on Matrix Analysis and Applications*.

*Computing in Science & Engineering*.

*arXiv:1505.07570 [Cs]*.

*Sketching as a Tool for Numerical Linear Algebra*. Foundations and Trends in Theoretical Computer Science 1.0.

*Proceedings of The 26th International Conference on Artificial Intelligence and Statistics*.

*J. Mach. Learn. Res.*

*SIAM Journal on Mathematics of Data Science*.