Specifically \((\mathrm{K}=\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})\) where \(\mathrm{K}\in\mathbb{R}^{N\times N}\) and \(\mathrm{Z}\in\mathbb{R}^{N\times D}\) with \(D\ll N\). A workhorse. We might get a cool low rank decompositions like this from matrix factorisation, but they arise everywhere. To pick one example, Gaussian processes.
Lots of fun tricks, mostly because of the Woodbury Identity. See Ken Tayβs intro on that.
Inverting
Specifically, solving \(\mathrm{X}=\mathrm{K}\mathrm{B}\) for \(\mathrm{X}\) and, in particular solving it efficiently, in the sense that we
- exploit the computational efficiency of the low rank structure of \(\mathrm{K}\) so that it costs less than \(\mathcal{O}(D^3M)\) to compute \(\mathrm{K}^{-1}\mathrm{B}\).
- avoid every forming the explicit inverse matrix \(\mathrm{K}^{-1}\) which requires storage \(\mathcal{O}(D^2)\).
This is possible using the following useful trick. Applying the Woodbury identity, \[\begin{align*} \mathrm{K}^{-1}=\sigma^{-2}\mathrm{I}-\sigma^{-4} \mathrm{Z}\left(\mathrm{I}+\sigma^{-2}\mathrm{Z}^{\top} \mathrm{Z}\right)^{-1} \mathrm{Z}^{\top} \end{align*}\] we compute the lower Cholesky decomposition \(\mathrm{L} \mathrm{L}^{\top}=\left(\mathrm{I}+\sigma^{-2}\mathrm{Z}^{\top} \mathrm{Z}\right)^{-1}\) at a cost of \(\mathcal{O}(N^3)\), and define \(\mathrm{R}=\sigma^{-2}\mathrm{Z} \mathrm{L}\). We use this to discover \[ \mathrm{K}^{-1}=\sigma^{-2}\mathrm{I}-\mathrm{R} \mathrm{R}^{\top}, \] and we may thus compute the solution by matrix multiplication \[\begin{aligned} \mathrm{K}^{-1}\mathrm{B} &=\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)^{-1}\mathrm{B}\\ &=\left(\sigma^{-2}\mathrm{I}-\mathrm{R} \mathrm{R}^{\top}\right)\mathrm{B}\\ &=\underbrace{\sigma^{-2}\mathrm{B}}_{D \times M} - \underbrace{\mathrm{R}}_{D\times N} \underbrace{\mathrm{R}^{\top}\mathrm{B}}_{N\times M} \end{aligned}\]
The solution of the linear system is available at cost which looks something like \(\mathcal{O}\left(N^2 D + NDM +N^3\right)\) (hmm, should check that). Generalising from \(\sigma^2\mathrm{I}\) to arbitrary diagonal is easy.
TODO: discuss positive-definiteness.
TOD: Centered version (Ameli and Shadden 2023; D. Harville 1976; D. A. Harville 1977; Henderson and Searle 1981).
Eigendecomposition/SVD
Nakatsukasa (2019) observes that
The nonzero eigenvalues of \(\mathrm{Y} \mathrm{Z}\) are equal to those of \(\mathrm{Z} \mathrm{Y}\) : an identity that holds as long as the products are square, even when \(\mathrm{Y}, \mathrm{Z}\) are rectangular. This fact naturally suggests an efficient algorithm for computing eigenvalues and eigenvectors of a low-rank matrix \(\mathrm{K}=\mathrm{Y} \mathrm{Z}\) with \(\mathrm{Y}, \mathrm{Z}^{\top} \in \mathbb{C}^{N \times r}, N \gg r\) : form the small \(r \times r\) matrix \(\mathrm{Z} \mathrm{Y}\) and find its eigenvalues and eigenvectors. For nonzero eigenvalues, the eigenvectors are related by \(\mathrm{Y} \mathrm{Z} v=\lambda v \Leftrightarrow \mathrm{Z} \mathrm{Y} w=\lambda w\) with \(w=\mathrm{Z} v\)[β¦]
Concerting that to the case of \(\mathrm{K}\) we have that the nonzero eigenvalues of \(\mathrm{Z} \mathrm{Z}^{\top}\) are equal to those of \(\mathrm{Z}^{\top} \mathrm{Z}\); to compute eigenvalues and eigenvectors of a low-rank matrix \(X=\mathrm{Z} \mathrm{Z}^{\top}\): form the small \(N \times N\) matrix \(\mathrm{Z}^{\top} \mathrm{Z}\) and find its eigenvalues and eigenvectors. For nonzero eigenvalues, the eigenvectors are related by \(\mathrm{Z} \mathrm{Z}^{\top} v=\lambda v \Leftrightarrow \mathrm{Z}^{\top} \mathrm{Z} w=\lambda w\) with \(w=\mathrm{Z}^{\top} v\).
A classic piece of lore is cheap eigendecomposition of \(\mathrm{K}\) by exploiting the low rank structure and SVD. I have no idea who invented this, but here goes. First we calculate the SVD of \(\mathrm{Z}\) to obtain \(\mathrm{Z}=\mathrm{U}\mathrm{S}\mathrm{V}^{\top}\), where \(\mathrm{U}\in\mathbb{R}^{D\times N}\) and \(\mathrm{V}\in\mathbb{R}^{N\times N}\) are orthogonal and \(\mathrm{S}\in\mathbb{R}^{N\times N}\) is diagonal. Then we may write \[ \begin{aligned} \mathrm{K} &= \mathrm{Z} \mathrm{Z}^{\top} + \sigma^2 \mathrm{I} \\ &= \mathrm{U} \mathrm{S} \mathrm{V}^{\top} \mathrm{V} \mathrm{S} \mathrm{U}^{\top} + \sigma^2 \mathrm{I} \\ &= \mathrm{U} \mathrm{S}^2 \mathrm{U}^{\top} + \sigma^2 \mathrm{I} \end{aligned} \] Thus the top \(N\) eigenvalues of \(\mathrm{K}\) are \(\sigma^2+s_n^2\), and the corresponding eigenvectors are \(\boldsymbol{u}_n\). The remaining eigenvalues are \(\sigma^2\), and the corresponding eigenvectors are an arbitrary subset in the complement of the \(\mathrm{U}\) eigenvectors.
Cholesky
Louis Tiao, Efficient Cholesky decomposition of low-rank updates summarising Seeger (2004).
Roots
See matrix square roots.
Multiplying low-rank-plus-diagonal matrices
Specifically, \((\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{I})(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})\). Are low rank products cheap?
\[ \begin{aligned} (\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{I})(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I}) &=\mathrm{Y} \mathrm{Y}^{\top} \mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{Z} \mathrm{Z}^{\top}+\sigma^4\mathrm{I}\\ &=\mathrm{Y} (\mathrm{Y}^{\top} \mathrm{Z} )\mathrm{Z}^{\top}+\sigma^2\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{Z} \mathrm{Z}^{\top}+\sigma^4\mathrm{I} \end{aligned} \] which is still a sum of low-rank approximations. At this point it might be natural to consider a tensor decomposition.
Multiplying inverse low-rank-plus-diagonal matrices
Suppose the low-rank inverse factors of \(\mathrm{Y}\) and \(\mathrm{X}\) are, respectively, \(\mathrm{R}\) and \(\mathrm{C}\). Then we have
\[ \begin{aligned} &(\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{I})^{-1}(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})^{-1}\\ &=(\sigma^{-2}\mathrm{I}-\mathrm{R} \mathrm{R}^{\top})(\sigma^{-2}\mathrm{I}-\mathrm{C} \mathrm{C}^{\top})\\ &=\sigma^{-4}\mathrm{I}-\sigma^{-4}\mathrm{R} \mathrm{R}^{\top}-\sigma^{-4}\mathrm{C} \mathrm{C}^{\top}+\sigma^{-4}\mathrm{R} (\mathrm{R}^{\top}\mathrm{C}) \mathrm{C}^{\top}\\ \end{aligned} \]
Once again, cheap to evaluate, but not so obviously nice.
Distances
Frobenius
Suppose we want to measure the Frobenius distance between \(\mathrm{K}_{\mathrm{U},\sigma^2}\) and \(\mathrm{K}_{\mathrm{R},\gamma^2}\). We recall that we might expect things to be nice if they are exactly low rank because, e.g. \[ \begin{aligned} \|\mathrm{U}\mathrm{U}^{\top}\|_F^2 =\operatorname{tr}\left(\mathrm{U}\mathrm{U}^{\top}\mathrm{U}\mathrm{U}^{\top}\right) =\|\mathrm{U}^{\top}\mathrm{U}\|_F^2 \end{aligned} \] How does it come out as a distance between two low-rank-plus-diagonaly matrices. The answer may be found without forming the full matrices. For compactness, we define \(\delta^2=\sigma^2-\gamma^2\). \[ \begin{aligned} &\|\mathrm{U}\mathrm{U}^{\dagger}+\sigma^{2}\mathrm{I}-\mathrm{R}\mathrm{R}^{\dagger}+\gamma^{2}\mathrm{I}\|_F^2\\ &=\left\|\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger} + \delta^2\mathrm{I}\right\|_{F}^2\\ &=\left\|\mathrm{U}\mathrm{U}^{\dagger}+i\mathrm{R}i\mathrm{R}^{\dagger} + \delta^2\mathrm{I} \right\|_{F}^2\\ &=\left\|\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger} + \delta^2\mathrm{I} \right\|_{F}^2\\ &=\left\langle\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger} + \delta^2\mathrm{I},\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger} + \delta^2\mathrm{I} \right\rangle_{F}\\ &=\left\langle\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger},\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\right\rangle_{F} +\left\langle\delta^2\mathrm{I}, \delta^2\mathrm{I} \right\rangle_{F}\\ &\quad+2\operatorname{Re}\left(\left\langle\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}, \delta^2\mathrm{I} \right\rangle_{F}\right)\\ &=\operatorname{Tr}\left(\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\right) +\delta^4D\\ &\quad+2\delta^2\operatorname{Re}\operatorname{Tr}\left(\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\right)\\ &=\operatorname{Tr}\left(\left(\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger}\right)\left(\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger}\right)\right) +\delta^4D\\ &\quad+2\delta^2\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger}\right)\\ &=\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger}\mathrm{U}\mathrm{U}^{\dagger}\right) -2\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\right) + \operatorname{Tr}\left(\mathrm{R}\mathrm{R}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\right) +\delta^4D \\ &\quad+2\delta^2\left(\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger}\right) -\operatorname{Tr}\left(\mathrm{R}\mathrm{R}^{\dagger}\right)\right)\\ &=\operatorname{Tr}\left(\mathrm{U}^{\dagger}\mathrm{U}\mathrm{U}^{\dagger}\mathrm{U}\right) -2\operatorname{Tr}\left(\mathrm{U}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\mathrm{U}\right) + \operatorname{Tr}\left(\mathrm{R}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\mathrm{R}\right) +\delta^4D \\ &\quad+2\delta^2\left(\operatorname{Tr}\left(\mathrm{U}^{\dagger}\mathrm{U}\right) -\operatorname{Tr}\left(\mathrm{R}^{\dagger}\mathrm{R}\right)\right)\\ &=\left\|\mathrm{U}^{\dagger}\mathrm{U}\right\|^2_F -2\left\|\mathrm{U}^{\dagger}\mathrm{R}\right\|^2_F + \left\|\mathrm{R}^{\dagger}\mathrm{R}\right\|^2_F +\delta^4D +2\delta^2\left(\left\|\mathrm{U}\right\|^2_F -\left\|\mathrm{R}\right\|^2_F\right) \end{aligned} \]
Tools
Mostly I use pytorchβs linear algebra.
No comments yet. Why not leave one?