# Low-rank-plus-diagonal matrix representations

### Assumed audience:

Specifically $$(\mathrm{K}=\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})$$ where $$\mathrm{K}\in\mathbb{R}^{N\times N}$$ and $$\mathrm{Z}\in\mathbb{R}^{N\times D}$$ with $$D\ll N$$. A workhorse. We might get a cool low rank decompositions like this from matrix factorisation, but they arise everywhere. To pick one example, Gaussian processes.

Lots of fun tricks, mostly because of the Woodbury Identity. See Ken Tayβs intro on that.

## Inverting

Specifically, solving $$\mathrm{X}=\mathrm{K}\mathrm{B}$$ for $$\mathrm{X}$$ and, in particular solving it efficiently, in the sense that we

1. exploit the computational efficiency of the low rank structure of $$\mathrm{K}$$ so that it costs less than $$\mathcal{O}(D^3M)$$ to compute $$\mathrm{K}^{-1}\mathrm{B}$$.
2. avoid every forming the explicit inverse matrix $$\mathrm{K}^{-1}$$ which requires storage $$\mathcal{O}(D^2)$$.

This is possible using the following useful trick. Applying the Woodbury identity, \begin{align*} \mathrm{K}^{-1}=\sigma^{-2}\mathrm{I}-\sigma^{-4} \mathrm{Z}\left(\mathrm{I}+\sigma^{-2}\mathrm{Z}^{\top} \mathrm{Z}\right)^{-1} \mathrm{Z}^{\top} \end{align*} we compute the lower Cholesky decomposition $$\mathrm{L} \mathrm{L}^{\top}=\left(\mathrm{I}+\sigma^{-2}\mathrm{Z}^{\top} \mathrm{Z}\right)^{-1}$$ at a cost of $$\mathcal{O}(N^3)$$, and define $$\mathrm{R}=\sigma^{-2}\mathrm{Z} \mathrm{L}$$. We use this to discover $\mathrm{K}^{-1}=\sigma^{-2}\mathrm{I}-\mathrm{R} \mathrm{R}^{\top},$ and we may thus compute the solution by matrix multiplication \begin{aligned} \mathrm{K}^{-1}\mathrm{B} &=\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)^{-1}\mathrm{B}\\ &=\left(\sigma^{-2}\mathrm{I}-\mathrm{R} \mathrm{R}^{\top}\right)\mathrm{B}\\ &=\underbrace{\sigma^{-2}\mathrm{B}}_{D \times M} - \underbrace{\mathrm{R}}_{D\times N} \underbrace{\mathrm{R}^{\top}\mathrm{B}}_{N\times M} \end{aligned}

The solution of the linear system is available at cost which looks something like $$\mathcal{O}\left(N^2 D + NDM +N^3\right)$$ (hmm, should check that). Generalising from $$\sigma^2\mathrm{I}$$ to arbitrary diagonal is easy.

TODO: discuss positive-definiteness.

TOD: Centered version .

## Eigendecomposition/SVD

Nakatsukasa (2019) observes that

The nonzero eigenvalues of $$\mathrm{Y} \mathrm{Z}$$ are equal to those of $$\mathrm{Z} \mathrm{Y}$$ : an identity that holds as long as the products are square, even when $$\mathrm{Y}, \mathrm{Z}$$ are rectangular. This fact naturally suggests an efficient algorithm for computing eigenvalues and eigenvectors of a low-rank matrix $$\mathrm{K}=\mathrm{Y} \mathrm{Z}$$ with $$\mathrm{Y}, \mathrm{Z}^{\top} \in \mathbb{C}^{N \times r}, N \gg r$$ : form the small $$r \times r$$ matrix $$\mathrm{Z} \mathrm{Y}$$ and find its eigenvalues and eigenvectors. For nonzero eigenvalues, the eigenvectors are related by $$\mathrm{Y} \mathrm{Z} v=\lambda v \Leftrightarrow \mathrm{Z} \mathrm{Y} w=\lambda w$$ with $$w=\mathrm{Z} v$$[β¦]

Concerting that to the case of $$\mathrm{K}$$ we have that the nonzero eigenvalues of $$\mathrm{Z} \mathrm{Z}^{\top}$$ are equal to those of $$\mathrm{Z}^{\top} \mathrm{Z}$$; to compute eigenvalues and eigenvectors of a low-rank matrix $$X=\mathrm{Z} \mathrm{Z}^{\top}$$: form the small $$N \times N$$ matrix $$\mathrm{Z}^{\top} \mathrm{Z}$$ and find its eigenvalues and eigenvectors. For nonzero eigenvalues, the eigenvectors are related by $$\mathrm{Z} \mathrm{Z}^{\top} v=\lambda v \Leftrightarrow \mathrm{Z}^{\top} \mathrm{Z} w=\lambda w$$ with $$w=\mathrm{Z}^{\top} v$$.

A classic piece of lore is cheap eigendecomposition of $$\mathrm{K}$$ by exploiting the low rank structure and SVD. I have no idea who invented this, but here goes. First we calculate the SVD of $$\mathrm{Z}$$ to obtain $$\mathrm{Z}=\mathrm{U}\mathrm{S}\mathrm{V}^{\top}$$, where $$\mathrm{U}\in\mathbb{R}^{D\times N}$$ and $$\mathrm{V}\in\mathbb{R}^{N\times N}$$ are orthogonal and $$\mathrm{S}\in\mathbb{R}^{N\times N}$$ is diagonal. Then we may write \begin{aligned} \mathrm{K} &= \mathrm{Z} \mathrm{Z}^{\top} + \sigma^2 \mathrm{I} \\ &= \mathrm{U} \mathrm{S} \mathrm{V}^{\top} \mathrm{V} \mathrm{S} \mathrm{U}^{\top} + \sigma^2 \mathrm{I} \\ &= \mathrm{U} \mathrm{S}^2 \mathrm{U}^{\top} + \sigma^2 \mathrm{I} \end{aligned} Thus the top $$N$$ eigenvalues of $$\mathrm{K}$$ are $$\sigma^2+s_n^2$$, and the corresponding eigenvectors are $$\boldsymbol{u}_n$$. The remaining eigenvalues are $$\sigma^2$$, and the corresponding eigenvectors are an arbitrary subset in the complement of the $$\mathrm{U}$$ eigenvectors.

### Cholesky

Louis Tiao, Efficient Cholesky decomposition of low-rank updates summarising Seeger (2004).

## Multiplying low-rank-plus-diagonal matrices

Specifically, $$(\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{I})(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})$$. Are low rank products cheap?

\begin{aligned} (\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{I})(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I}) &=\mathrm{Y} \mathrm{Y}^{\top} \mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{Z} \mathrm{Z}^{\top}+\sigma^4\mathrm{I}\\ &=\mathrm{Y} (\mathrm{Y}^{\top} \mathrm{Z} )\mathrm{Z}^{\top}+\sigma^2\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{Z} \mathrm{Z}^{\top}+\sigma^4\mathrm{I} \end{aligned} which is still a sum of low-rank approximations. At this point it might be natural to consider a tensor decomposition.

## Multiplying inverse low-rank-plus-diagonal matrices

Suppose the low-rank inverse factors of $$\mathrm{Y}$$ and $$\mathrm{X}$$ are, respectively, $$\mathrm{R}$$ and $$\mathrm{C}$$. Then we have

\begin{aligned} &(\mathrm{Y} \mathrm{Y}^{\top}+\sigma^2\mathrm{I})^{-1}(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})^{-1}\\ &=(\sigma^{-2}\mathrm{I}-\mathrm{R} \mathrm{R}^{\top})(\sigma^{-2}\mathrm{I}-\mathrm{C} \mathrm{C}^{\top})\\ &=\sigma^{-4}\mathrm{I}-\sigma^{-4}\mathrm{R} \mathrm{R}^{\top}-\sigma^{-4}\mathrm{C} \mathrm{C}^{\top}+\sigma^{-4}\mathrm{R} (\mathrm{R}^{\top}\mathrm{C}) \mathrm{C}^{\top}\\ \end{aligned}

Once again, cheap to evaluate, but not so obviously nice.

## Distances

### Frobenius

Suppose we want to measure the Frobenius distance between $$\mathrm{K}_{\mathrm{U},\sigma^2}$$ and $$\mathrm{K}_{\mathrm{R},\gamma^2}$$. We recall that we might expect things to be nice if they are exactly low rank because, e.g. \begin{aligned} \|\mathrm{U}\mathrm{U}^{\top}\|_F^2 =\operatorname{tr}\left(\mathrm{U}\mathrm{U}^{\top}\mathrm{U}\mathrm{U}^{\top}\right) =\|\mathrm{U}^{\top}\mathrm{U}\|_F^2 \end{aligned} How does it come out as a distance between two low-rank-plus-diagonaly matrices. The answer may be found without forming the full matrices. For compactness, we define $$\delta^2=\sigma^2-\gamma^2$$. \begin{aligned} &\|\mathrm{U}\mathrm{U}^{\dagger}+\sigma^{2}\mathrm{I}-\mathrm{R}\mathrm{R}^{\dagger}+\gamma^{2}\mathrm{I}\|_F^2\\ &=\left\|\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger} + \delta^2\mathrm{I}\right\|_{F}^2\\ &=\left\|\mathrm{U}\mathrm{U}^{\dagger}+i\mathrm{R}i\mathrm{R}^{\dagger} + \delta^2\mathrm{I} \right\|_{F}^2\\ &=\left\|\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger} + \delta^2\mathrm{I} \right\|_{F}^2\\ &=\left\langle\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger} + \delta^2\mathrm{I},\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger} + \delta^2\mathrm{I} \right\rangle_{F}\\ &=\left\langle\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger},\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\right\rangle_{F} +\left\langle\delta^2\mathrm{I}, \delta^2\mathrm{I} \right\rangle_{F}\\ &\quad+2\operatorname{Re}\left(\left\langle\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}, \delta^2\mathrm{I} \right\rangle_{F}\right)\\ &=\operatorname{Tr}\left(\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\right) +\delta^4D\\ &\quad+2\delta^2\operatorname{Re}\operatorname{Tr}\left(\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{\dagger}\right)\\ &=\operatorname{Tr}\left(\left(\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger}\right)\left(\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger}\right)\right) +\delta^4D\\ &\quad+2\delta^2\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger} -\mathrm{R}\mathrm{R}^{\dagger}\right)\\ &=\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger}\mathrm{U}\mathrm{U}^{\dagger}\right) -2\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\right) + \operatorname{Tr}\left(\mathrm{R}\mathrm{R}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\right) +\delta^4D \\ &\quad+2\delta^2\left(\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{\dagger}\right) -\operatorname{Tr}\left(\mathrm{R}\mathrm{R}^{\dagger}\right)\right)\\ &=\operatorname{Tr}\left(\mathrm{U}^{\dagger}\mathrm{U}\mathrm{U}^{\dagger}\mathrm{U}\right) -2\operatorname{Tr}\left(\mathrm{U}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\mathrm{U}\right) + \operatorname{Tr}\left(\mathrm{R}^{\dagger}\mathrm{R}\mathrm{R}^{\dagger}\mathrm{R}\right) +\delta^4D \\ &\quad+2\delta^2\left(\operatorname{Tr}\left(\mathrm{U}^{\dagger}\mathrm{U}\right) -\operatorname{Tr}\left(\mathrm{R}^{\dagger}\mathrm{R}\right)\right)\\ &=\left\|\mathrm{U}^{\dagger}\mathrm{U}\right\|^2_F -2\left\|\mathrm{U}^{\dagger}\mathrm{R}\right\|^2_F + \left\|\mathrm{R}^{\dagger}\mathrm{R}\right\|^2_F +\delta^4D +2\delta^2\left(\left\|\mathrm{U}\right\|^2_F -\left\|\mathrm{R}\right\|^2_F\right) \end{aligned}

## Tools

Mostly I use pytorchβs linear algebra.

## References

Akimoto, Youhei. 2017. arXiv.
Ameli, Siavash, and Shawn C. Shadden. 2023. Applied Mathematics and Computation 452 (September): 128032.
Babacan, S. Derin, Martin Luessi, Rafael Molina, and Aggelos K. Katsaggelos. 2012. IEEE Transactions on Signal Processing 60 (8): 3964β77.
Bach, C, D. Ceglia, L. Song, and F. Duddeck. 2019. International Journal for Numerical Methods in Engineering 118 (4): 209β41.
Bach, Francis R. 2013. In COLT, 30:185β209.
Barbier, Jean, Nicolas Macris, and LΓ©o Miolane. 2017. arXiv:1709.10368 [Cond-Mat, Physics:math-Ph], September.
Bauckhage, Christian. 2015. arXiv:1512.07548 [Stat], December.
Berry, Michael W., Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. 2007. Computational Statistics & Data Analysis 52 (1): 155β73.
Brand, Matthew. 2002. In Computer Vision β ECCV 2002, edited by Anders Heyden, Gunnar Sparr, Mads Nielsen, and Peter Johansen, 2350:707β20. Berlin, Heidelberg: Springer Berlin Heidelberg.
βββ. 2006. Linear Algebra and Its Applications, Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems, 415 (1): 20β30.
Chen, Yudong, and Yuejie Chi. 2018. IEEE Signal Processing Magazine 35 (4): 14β31.
Chi, Yuejie, Yue M. Lu, and Yuxin Chen. 2019. IEEE Transactions on Signal Processing 67 (20): 5239β69.
Cichocki, A., N. Lee, I. V. Oseledets, A.-H. Phan, Q. Zhao, and D. Mandic. 2016. arXiv:1609.00893 [Cs], September.
Drineas, Petros, and Michael W. Mahoney. 2005. Journal of Machine Learning Research 6 (December): 2153β75.
Fasi, Massimiliano, Nicholas J. Higham, and Xiaobo Liu. 2023. SIAM Journal on Matrix Analysis and Applications 44 (1): 156β74.
Flammia, Steven T., David Gross, Yi-Kai Liu, and Jens Eisert. 2012. New Journal of Physics 14 (9): 095022.
Ghashami, Mina, Edo Liberty, Jeff M. Phillips, and David P. Woodruff. 2015. arXiv:1501.01711 [Cs], January.
Gordon, Geoffrey J. 2002. In Proceedings of the 15th International Conference on Neural Information Processing Systems, 593β600. NIPSβ02. Cambridge, MA, USA: MIT Press.
Gross, D. 2011. IEEE Transactions on Information Theory 57 (3): 1548β66.
Gross, David, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. 2010. Physical Review Letters 105 (15).
Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. 2010. arXiv.
Harbrecht, Helmut, Michael Peters, and Reinhold Schneider. 2012. Applied Numerical Mathematics, Third Chilean Workshop on Numerical Analysis of Partial Differential Equations (WONAPDE 2010), 62 (4): 428β40.
Harville, David. 1976. The Annals of Statistics 4 (2): 384β95.
Harville, David A. 1977. Journal of the American Statistical Association 72 (358): 320β38.
Hastie, Trevor, Rahul Mazumder, Jason D. Lee, and Reza Zadeh. 2015. In Journal of Machine Learning Research, 16:3367β3402.
Henderson, H. V., and S. R. Searle. 1981. SIAM Review 23 (1): 53β60.
Hoaglin, David C., and Roy E. Welsch. 1978. The American Statistician 32 (1): 17β22.
Kannan, Ramakrishnan. 2016. April.
Kannan, Ramakrishnan, Grey Ballard, and Haesun Park. 2016. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 9:1β11. PPoPP β16. New York, NY, USA: ACM.
Kim, H., and H. Park. 2008. SIAM Journal on Matrix Analysis and Applications 30 (2): 713β30.
Kumar, N. Kishore, and Jan Shneider. 2016. arXiv:1606.06511 [Cs, Math], June.
Liberty, Edo, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2007. Proceedings of the National Academy of Sciences 104 (51): 20167β72.
Lim, Yew Jin, and Yee Whye Teh. 2007. βVariational Bayesian Approach to Movie Rating Prediction.β In Proceedings of KDD Cup and Workshop, 7:15β21. Citeseer.
Lin, Zhouchen. 2016.
Liu, T., and D. Tao. 2015. IEEE Transactions on Neural Networks and Learning Systems PP (99): 1β1.
Lu, Jun. 2022. arXiv.
Mahoney, Michael W. 2010. Randomized Algorithms for Matrices and Data. Vol. 3.
Martinsson, Per-Gunnar. 2016. arXiv:1607.01649 [Math], July.
Minka, Thomas P. 2000. Old and new matrix algebra useful for statistics.
Nakajima, Shinichi, and Masashi Sugiyama. 2012. βTheoretical Analysis of Bayesian Matrix Factorization.β Journal of Machine Learning Research, 66.
Nakatsukasa, Yuji. 2019. arXiv.
Nowak, W., and A. Litvinenko. 2013. Mathematical Geosciences 45 (4): 411β35.
Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012. βThe Matrix Cookbook.β
Rokhlin, Vladimir, Arthur Szlam, and Mark Tygert. 2009. SIAM J. Matrix Anal. Appl. 31 (3): 1100β1124.
Salakhutdinov, Ruslan, and Andriy Mnih. 2008. In Proceedings of the 25th International Conference on Machine Learning, 880β87. ICML β08. New York, NY, USA: ACM.
Saul, Lawrence K. 2023. Transactions on Machine Learning Research, January.
Seeger, Matthias, ed. 2004. Low Rank Updates for the Cholesky Decomposition.
Seshadhri, C., Aneesh Sharma, Andrew Stolman, and Ashish Goel. 2020. Proceedings of the National Academy of Sciences 117 (11): 5631β37.
Shi, Jiarong, Xiuyun Zheng, and Wei Yang. 2017. Entropy 19 (8): 424.
Srebro, Nathan, Jason D. M. Rennie, and Tommi S. Jaakkola. 2004. In Advances in Neural Information Processing Systems, 17:1329β36. NIPSβ04. Cambridge, MA, USA: MIT Press.
Sundin, Martin. 2016. βBayesian Methods for Sparse and Low-Rank Matrix Problems.β PhD Thesis, KTH Royal Institute of Technology.
Tropp, Joel A., Alp Yurtsever, Madeleine Udell, and Volkan Cevher. 2016. arXiv:1609.00048 [Cs, Math, Stat], August.
βββ. 2017. SIAM Journal on Matrix Analysis and Applications 38 (4): 1454β85.
Tufts, D. W., and R. Kumaresan. 1982. Proceedings of the IEEE 70 (9): 975β89.
TΓΌrkmen, Ali Caner. 2015. arXiv:1507.03194 [Cs, Stat], July.
Udell, M., and A. Townsend. 2019. SIAM Journal on Mathematics of Data Science 1 (1): 144β60.
Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019. arXiv:1901.11436 [Cs, Eess, Stat], January.
Woodruff, David P. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science 1.0. Now Publishers.
Woolfe, Franco, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. 2008. Applied and Computational Harmonic Analysis 25 (3): 335β66.
Xinghao Ding, Lihan He, and L. Carin. 2011. IEEE Transactions on Image Processing 20 (12): 3419β30.
Yang, Linxiao, Jun Fang, Huiping Duan, Hongbin Li, and Bing Zeng. 2018. IEEE Transactions on Signal Processing 66 (11): 2804β17.
Yin, M., J. Gao, and Z. Lin. 2016. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (3): 504β17.
Yu, Chenhan D., William B. March, and George Biros. 2017. In arXiv:1701.02324 [Cs].
Yu, Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit S. Dhillon. 2012. In IEEE International Conference of Data Mining, 765β74.
βββ. 2014. Knowledge and Information Systems 41 (3): 793β819.
Zhang, Kai, Chuanren Liu, Jie Zhang, Hui Xiong, Eric Xing, and Jieping Ye. 2017. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 615β23. KDD β17. New York, NY, USA: ACM.
Zhang, Xiao, Lingxiao Wang, and Quanquan Gu. 2017. arXiv:1701.00481 [Stat], January.
Zhou, Tianyi, and Dacheng Tao. 2011.
βββ. 2012. Journal of Machine Learning Research.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.