Low-rank matrices



Assumed audience:

People with undergrad linear algebra

Here is a useful form that that some matrix might possess: \[\mathrm{K}= \mathrm{Z} \mathrm{Z}^{H}\] where \(\mathrm{K}\in\mathbb{R}^{N\times N}\), \(\mathrm{Z}\in\mathbb{R}^{N\times D}\) with \(D\ll N\). Such matrices clearly Hermitian, and arise in, e.g. covariance estimation. I write \(\mathrm{Z}^{H}\) for the conjugate transpose of \(\mathrm{Z}\) and use that here because sometimes I want to think of \(\mathrm{Z}\) as a complex matrix, and last I checked, most stuff here generalised to that case easily under conjugate transposition. YMMV.

We call matrices of this form low-rank. Their cousins, the low-rank-plus-diagonal matrices, are also useful.

Here are some minor results about them that I needed to write down somewhere.

Pseudo-inverses

Since \(D\ll N\) the matrices in question here are trivially singular and so have no inverses. But they might have a pseudo inverse or some kind of generalised inverse.

Consider the Moore-Penrose pseudo inverse of \(\mathrm{K}\) which we write \(\mathrm{K}^+\). The famous way of constructing it for general \(\mathrm{K}\) is by taking a SVD of \(\mathrm{K}=\mathrm{U}_{\mathrm{K}}\mathrm{S}_{\mathrm{K}}\mathrm{V}_{\mathrm{K}}^{H}\) where \(\mathrm{U}_{\mathrm{K}}\) and \(\mathrm{V}_{\mathrm{K}}\) are unitary and \(\mathrm{S}_{\mathrm{K}}\) is diagonal. Then we define \(\mathrm{S}_{\mathrm{K}}^+\) to be the pseudo-inverse of the diagonal matrix of singular values, which we construct by setting all non-zero entries to their own reciprocal, but otherwise leave at 0. We have sneakily decided that pseudo-inverse of a diagonal matrix is easy; we just take the reciprocal of the non-zero entries. This turns out to do the right thing, if you check it, and it does not even sound crazy, but also it is not, to me at least, totally obvious.

Next, the pseudo-inverse of the whole thing is \(\mathrm{K}^+=\mathrm{V}_{\mathrm{K}}\mathrm{S}_{\mathrm{K}}^+\mathrm{U}_{\mathrm{K}}^{H}\), we claim. If we check the object we create by this procedure, we discover that it satisfies the right properties.

Can we construct this pseudo inverse specifically for a low-rank matrix? Let’s try taking the SVD of the associated low rank factors and see what happens. Let \(\mathrm{Z}=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}\mathrm{V}_{\mathrm{Z}}^{H}\) be the SVD of \(\mathrm{Z}\) so that \(\mathrm{U}_{\mathrm{Z}}\) and \(\mathrm{V}_{\mathrm{Z}}\) are unitary and \(\mathrm{S}_{\mathrm{Z}}\) is diagonal. Then \(\mathrm{K}=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{V}_{\mathrm{Z}}^{H}\mathrm{V}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H}=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H}\).

Next, the pseudo-inverse of \(\mathrm{Z}^+=\mathrm{V}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H}\) (the SVD construction).

Checking the matrix cookbook (Petersen and Pedersen 2012), we see that \[\begin{aligned} (\mathrm{Z}\mathrm{Z}^\dagger)^+ &=\mathrm{Z}^{+\dagger}\mathrm{Z}^{+} \end{aligned}\] so \[\begin{aligned} \mathrm{Z}^{+^\dagger}\mathrm{Z}^+ &=(\mathrm{V}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H})^{H}(\mathrm{V}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H})\\ &=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{V}_{\mathrm{Z}}^{H}\mathrm{V}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H}\\ &=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H} \end{aligned}\] It looks like we should be taking \(\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\) to be the low-rank factor of the pseudo-inverse of \(\mathrm{Z}\), right?

We might want to check that the desired pseudo-inverse properties hold. Recall, the Moore-Penrose pseudo inverse of a matrix \(\mathrm{K}\) is the matrix \(\mathrm{K}^{+}\) that fulfils

  1. \(\mathrm{K}\mathrm{K}^{+} \mathrm{K}=\mathrm{K}\)
  2. \(\mathrm{K}^{+} \mathrm{K} \mathrm{K}^{+}=\mathrm{K}^{+}\)
  3. \(\mathrm{K}\mathrm{K}^{+}\) symmetric
  4. \(\mathrm{K}^{+} \mathrm{K}\) symmetric

The last two are immediate. We might want to check the first two. Let us consider the first one, by way of example. Trying to calculate its properties by iterating the various pseudo-inverse rules is tedious, so let us consider what happens if we use the constructive form for the pseudoinverse, \(\mathrm{K}= \mathrm{Z} \mathrm{Z}^{H}\). We can deduce more: \[\begin{aligned} \mathrm{K}\mathrm{K}^{+} \mathrm{K} &=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}\mathrm{U}_{\mathrm{Z}}^{H} \mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H} \mathrm{K}\\ &=\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}} \mathrm{S}_{\mathrm{Z}}^+\mathrm{S}_{\mathrm{Z}}^+\mathrm{U}_{\mathrm{Z}}^{H} \mathrm{K}\\ &=\mathrm{U}_{\mathrm{Z}}\mathrm{I}_{\mathrm{Z}}\mathrm{U}_{\mathrm{Z}}^{H} \mathrm{K}\\ &=\mathrm{K}. \end{aligned}\] Here \(\mathrm{I}_{\mathrm{Z}}\) was a diagonal matrix with only as many non-zero entries as \(\mathrm{Z}\) had column rank, which ends up not poisoning the equality, I think, since \(\mathrm{K}\) had only that much rank anyway.

Anyway, this sloppy reasoning should encourage us to believe we have done nothing too silly here. I presume a proof for property 2 would be similar, but I have not actually done it. (Homework problem).

Is this pseudo-inverse low rank, though? Looks like it. In particular, we know that \(\mathrm{Z}\) had only \(N\) columns, and so \(\mathrm{S}_{\mathrm{Z}}\) has at most \(N\) non-zero entries. So we can take the thin SVD and know that that it will only preserve at most \(N\) columns of \(\mathrm{S}_{\mathrm{Z}}\), which is to say that we may as well take \(\mathrm{S}_{\mathrm{Z}}^+\) to be \(N\times N\), \(\mathrm{U}_{\mathrm{Z}}\) to be \(D\times N\) and the low-rank factor \(\mathrm{U}_{\mathrm{Z}}\mathrm{S}_{\mathrm{Z}}^+\) to be \(D\times N\).

Bonus detail: The SVD is not necessarily unique, even the reduced SVD, if there are singular values that are repeated. I think that for my purposes this is OK to ignore, but noting it here in anticipation of weird failure modes in the future.

tl;dr: pseudo-inverses of low-rank matrices are low-rank, may be found by SVD.

Was that not exhausting? Let us state the following pithy facts from Searle (2014):

The matrix \(\mathrm{X}^{H} \mathrm{X}\) plays an important role in statistics, usually involving a generalized inverse thereof, which has several useful properties. Thus, for \(\mathrm{G}\) satisfying \[ \mathrm{X}^{H} \mathrm{X G X}^{H} \mathrm{X}=\mathrm{X}^{H} \mathrm{X}, \] \(\mathrm{G}^{H}\) is also a generalized inverse of \(\mathrm{X}^{H} \mathrm{X}\) (and \(\mathrm{G}\) is not necessarily symmetric). Also,

  1. \(\mathrm{XGX}^{H} \mathrm{X}=\mathrm{X}\);
  2. \(\mathrm{X G X}^{H}\) is invariant to \(\mathrm{G}\);
  3. \(\mathrm{XGX}^{H}\) is symmetric, whether or not \(\mathrm{G}\) is;
  4. \(\mathrm{X G X}^{H}=\mathrm{X X}^{+}\)for \(\mathrm{X}^{+}\)being the Moore-Penrose inverse of \(\mathrm{X}\).

Further, Searle constructs the low-rank pseudo-inverse as \[ (\mathrm{X} \mathrm{X}^{H})^+ =\mathrm{X}\left(\mathrm{X}^{H} \mathrm{X}\right)^{-2} \mathrm{X}^{H}=\mathrm{Y} \mathrm{Y}^{H} \] for \(\mathrm{Y}:=\mathrm{X}\left(\mathrm{X}^{H} \mathrm{X}\right)^{-1}\).

Distances

Frobenius

Suppose we want to measure the Frobenius distance between \(\mathrm{U}\mathrm{U}^{H}\) and \(\mathrm{R}\mathrm{R}^{H}\). We recall that we might expect things to be nice if they are exactly low rank because, e.g. \[ \begin{aligned} \|\mathrm{U}\mathrm{U}^{H}\|_F^2 =\operatorname{tr}\left(\mathrm{U}\mathrm{U}^{H}\mathrm{U}\mathrm{U}^{H}\right) =\|\mathrm{U}^{H}\mathrm{U}\|_F^2. \end{aligned} \] Indeed things are nice, and the answer may be found without forming the full matrices: \[ \begin{aligned} &\|\mathrm{U}\mathrm{U}^{H}-\mathrm{R}\mathrm{R}^{H}\|_F^2\\ &=\left\|\mathrm{U}\mathrm{U}^{H} -\mathrm{R}\mathrm{R}^{H} \right\|_{F}^2\\ &=\left\|\mathrm{U}\mathrm{U}^{H}+i\mathrm{R}i\mathrm{R}^{H} \right\|_{F}^2\\ &=\left\|\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{H} \right\|_{F}^2\\ &=\operatorname{Tr}\left(\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{H}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}\begin{bmatrix} \mathrm{U} &i\mathrm{R}\end{bmatrix}^{H}\right)\\ &=\operatorname{Tr}\left(\left(\mathrm{U}\mathrm{U}^{H} -\mathrm{R}\mathrm{R}^{H}\right)\left(\mathrm{U}\mathrm{U}^{H} -\mathrm{R}\mathrm{R}^{H}\right)\right)\\ &=\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{H}\mathrm{U}\mathrm{U}^{H}\right) -2\operatorname{Tr}\left(\mathrm{U}\mathrm{U}^{H}\mathrm{R}\mathrm{R}^{H}\right) + \operatorname{Tr}\left(\mathrm{R}\mathrm{R}^{H}\mathrm{R}\mathrm{R}^{H}\right)\\ &=\left\|\mathrm{U}^{H}\mathrm{U}\right\|^2_F -2\left\|\mathrm{U}^{H}\mathrm{R}\right\|^2_F + \left\|\mathrm{R}^{H}\mathrm{R}\right\|^2_F. \end{aligned} \]

References

Akimoto, Youhei. 2017. β€œFast Eigen Decomposition for Low-Rank Matrix Approximation.” arXiv.
Babacan, S. Derin, Martin Luessi, Rafael Molina, and Aggelos K. Katsaggelos. 2012. β€œSparse Bayesian Methods for Low-Rank Matrix Estimation.” IEEE Transactions on Signal Processing 60 (8): 3964–77.
Bach, C, D. Ceglia, L. Song, and F. Duddeck. 2019. β€œRandomized Low-Rank Approximation Methods for Projection-Based Model Order Reduction of Large Nonlinear Dynamical Problems.” International Journal for Numerical Methods in Engineering 118 (4): 209–41.
Bach, Francis R. 2013. β€œSharp Analysis of Low-Rank Kernel Matrix Approximations.” In COLT, 30:185–209.
Brand, Matthew. 2006. β€œFast Low-Rank Modifications of the Thin Singular Value Decomposition.” Linear Algebra and Its Applications, Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems, 415 (1): 20–30.
Chen, Yudong, and Yuejie Chi. 2018. β€œHarnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation: Recent Theory and Fast Algorithms via Convex and Nonconvex Optimization.” IEEE Signal Processing Magazine 35 (4): 14–31.
Chi, Yuejie, Yue M. Lu, and Yuxin Chen. 2019. β€œNonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview.” IEEE Transactions on Signal Processing 67 (20): 5239–69.
Chow, Edmond, and Yousef Saad. 1997. β€œApproximate Inverse Techniques for Block-Partitioned Matrices.” SIAM Journal on Scientific Computing 18 (6): 1657–75.
Chung, Julianne, and Matthias Chung. 2014. β€œAn Efficient Approach for Computing Optimal Low-Rank Regularized Inverse Matrices.” Inverse Problems 30 (11): 114009.
Cichocki, A., N. Lee, I. V. Oseledets, A.-H. Phan, Q. Zhao, and D. Mandic. 2016. β€œLow-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1.” arXiv:1609.00893 [Cs], September.
Fasi, Massimiliano, Nicholas J. Higham, and Xiaobo Liu. 2023. β€œComputing the Square Root of a Low-Rank Perturbation of the Scaled Identity Matrix.” SIAM Journal on Matrix Analysis and Applications 44 (1): 156–74.
Gross, D. 2011. β€œRecovering Low-Rank Matrices From Few Coefficients in Any Basis.” IEEE Transactions on Information Theory 57 (3): 1548–66.
Hager, William W. 1989. β€œUpdating the Inverse of a Matrix.” SIAM Review 31 (2): 221–39.
Harbrecht, Helmut, Michael Peters, and Reinhold Schneider. 2012. β€œOn the Low-Rank Approximation by the Pivoted Cholesky Decomposition.” Applied Numerical Mathematics, Third Chilean Workshop on Numerical Analysis of Partial Differential Equations (WONAPDE 2010), 62 (4): 428–40.
Hastie, Trevor, Rahul Mazumder, Jason D. Lee, and Reza Zadeh. 2015. β€œMatrix Completion and Low-Rank SVD via Fast Alternating Least Squares.” In Journal of Machine Learning Research, 16:3367–3402.
Kannan, Ramakrishnan. 2016. β€œScalable and Distributed Constrained Low Rank Approximations,” April.
Khan, Emtiyaz. 2008. β€œUpdating Inverse of a Matrix When a Column Is Added/Removed.” Removed.
Kumar, N. Kishore, and Jan Shneider. 2016. β€œLiterature Survey on Low Rank Approximation of Matrices.” arXiv:1606.06511 [Cs, Math], June.
Liberty, Edo, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2007. β€œRandomized Algorithms for the Low-Rank Approximation of Matrices.” Proceedings of the National Academy of Sciences 104 (51): 20167–72.
Lin, Zhouchen. 2016. β€œA Review on Low-Rank Models in Signal and Data Analysis.”
Nakatsukasa, Yuji. 2019. β€œThe Low-Rank Eigenvalue Problem.” arXiv.
Nowak, W., and A. Litvinenko. 2013. β€œKriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques.” Mathematical Geosciences 45 (4): 411–35.
Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012. β€œThe Matrix Cookbook.”
Saad, Yousef. 2003. Iterative Methods for Sparse Linear Systems: Second Edition. 2nd ed. SIAM.
Saul, Lawrence K. 2023. β€œA Geometrical Connection Between Sparse and Low-Rank Matrices and Its Application to Manifold Learning.” Transactions on Machine Learning Research, January.
Searle, Shayle R. 2014. β€œMatrix Algebra.” In Wiley StatsRef: Statistics Reference Online. American Cancer Society.
Searle, Shayle R., and Andre I. Khuri. 2017. Matrix Algebra Useful for Statistics. John Wiley & Sons.
Seeger, Matthias, ed. 2004. Low Rank Updates for the Cholesky Decomposition.
Seshadhri, C., Aneesh Sharma, Andrew Stolman, and Ashish Goel. 2020. β€œThe Impossibility of Low-Rank Representations for Triangle-Rich Complex Networks.” Proceedings of the National Academy of Sciences 117 (11): 5631–37.
Shi, Jiarong, Xiuyun Zheng, and Wei Yang. 2017. β€œSurvey on Probabilistic Models of Low-Rank Matrix Factorizations.” Entropy 19 (8): 424.
Spantini, Alessio, Tiangang Cui, Karen Willcox, Luis Tenorio, and Youssef Marzouk. 2017. β€œGoal-Oriented Optimal Approximations of Bayesian Linear Inverse Problems.” SIAM Journal on Scientific Computing 39 (5): S167–96.
Spantini, Alessio, Antti Solonen, Tiangang Cui, James Martin, Luis Tenorio, and Youssef Marzouk. 2015. β€œOptimal Low-Rank Approximations of Bayesian Linear Inverse Problems.” SIAM Journal on Scientific Computing 37 (6): A2451–87.
Sundin, Martin. 2016. β€œBayesian Methods for Sparse and Low-Rank Matrix Problems.” PhD Thesis, KTH Royal Institute of Technology.
Tropp, Joel A., Alp Yurtsever, Madeleine Udell, and Volkan Cevher. 2016. β€œRandomized Single-View Algorithms for Low-Rank Matrix Approximation.” arXiv:1609.00048 [Cs, Math, Stat], August.
β€”β€”β€”. 2017. β€œPractical Sketching Algorithms for Low-Rank Matrix Approximation.” SIAM Journal on Matrix Analysis and Applications 38 (4): 1454–85.
Udell, M., and A. Townsend. 2019. β€œWhy Are Big Data Matrices Approximately Low Rank?” SIAM Journal on Mathematics of Data Science 1 (1): 144–60.
Woolfe, Franco, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. 2008. β€œA Fast Randomized Algorithm for the Approximation of Matrices.” Applied and Computational Harmonic Analysis 25 (3): 335–66.
Yang, Linxiao, Jun Fang, Huiping Duan, Hongbin Li, and Bing Zeng. 2018. β€œFast Low-Rank Bayesian Matrix Completion with Hierarchical Gaussian Prior Models.” IEEE Transactions on Signal Processing 66 (11): 2804–17.
Yin, M., J. Gao, and Z. Lin. 2016. β€œLaplacian Regularized Low-Rank Representation and Its Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (3): 504–17.
Zhang, Xiao, Lingxiao Wang, and Quanquan Gu. 2017. β€œStochastic Variance-Reduced Gradient Descent for Low-Rank Matrix Recovery from Linear Measurements.” arXiv:1701.00481 [Stat], January.
Zhou, Tianyi, and Dacheng Tao. 2011. β€œGoDec: Randomized Low-Rank & Sparse Matrix Decomposition in Noisy Case.” In Proceedings of the 28th International Conference on International Conference on Machine Learning, 33–40. ICML’11. Madison, WI, USA: Omnipress.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.