# Approximate matrix factorisations and decompositions

Sometimes even exact

August 5, 2014 — August 23, 2023

feature construction
functional analysis
high d
linear algebra
networks
probability
signal processing
sparser than thou
statistics

## 1 The classics

The big six exact matrix decompositions are

• Cholesky decomposition
• pivoted LU decomposition
• QR decomposition
• spectral decomposition
• Schur decomposition; and
• singular value decomposition.

See Nick Higham’s summary of those.

## 2 Approximate decompositions

Mastered QR and LU decompositions? There are now so many ways of factorising matrices that there are not enough acronyms in the alphabet to hold them, especially if we suspect our matrix is sparse, or could be made sparse because of some underlying constraint, or probably could, if squinted at in the right fashion, be such as a graph transition matrix, or Laplacian, or noisy transform of some smooth object, or at least would be close to sparse if we chose the right metric, or…

A big matrix is close to, in some sense, the (tensor/matrix) product (or sum, or…) of some matrices that are in some way simple (small-rank, small dimension, sparse), possibly with additional constraints. Can we find those simple matrices?

Ethan Epperly’s introduction to Low-rank Matrices puts many ideas clearly.

Here’s an example: Godec — A decomposition into low-rank and sparse components which loosely speaking, combines multidimensional factorisation and outlier detection.

GoDec is one of the most efficient algorithm for low-rank and sparse decomposition thanks to bilateral random projections (BRP), a fast approximation of SVD/PCA.

There are so many more of these things, depending on our preferred choice of metric, constraints, free variables and such.

Keywords: Matrix sketching, low-rank approximation, traditional dimensionality reduction.

Matrix concentration inequalities turn out to a useful tool to prove that a given matrix decomp is not too bad in a PAC-sense.

Igor Carron’s Matrix Factorization Jungle classifies the following problems as matrix-factorisation type.

Kernel Factorizations
Spectral clustering
$$[\mathrm{A} = \mathrm{D} \mathrm{X}]$$ with unknown $$\mathrm{D}$$ and $$\mathrm{X}$$, solve for sparse $$\mathrm{X}$$ and $$\mathrm{X}_i = 0$$ or $$1$$
K-Means / K-Median clustering
$$[\mathrm{A} = \mathrm{D} \mathrm{X}]$$ with unknown and , solve for $$\mathrm{X} \mathrm{X}^{\top} = \mathrm{I}$$ and $$\mathrm{X}_i = 0 or 1$$
Subspace clustering
$$[\mathrm{A} = \mathrm{A} \mathrm{X}]$$ with unknown $$\mathrm{X}$$, solve for sparse/other conditions on $$\mathrm{X}$$
Graph Matching
$$[\mathrm{A} = \mathrm{X} \mathrm{B} \mathrm{X} ^{\top}]$$ with unknown $$\mathrm{X}$$, $$\mathrm{B}$$ solve for $$\mathrm{B}$$ and $$\mathrm{X}$$ as a permutation
NMF
$$[\mathrm{A} = \mathrm{D} \mathrm{X}]$$ with unknown $$\mathrm{D}$$ and $$\mathrm{X}$$, solve for elements of $$\mathrm{D}$$,$$\mathrm{X}$$ positive
Generalized Matrix Factorization
$$[\mathrm{W}.*\mathrm{L} − \mathrm{W}.*\mathrm{U} \mathrm{V}']$$ with $$\mathrm{W}$$ a known mask, $$\mathrm{U}$$,$$\mathrm{V}$$ unknowns solve for $$\mathrm{U}$$,$$\mathrm{V}$$ and $$\mathrm{L}$$ lowest rank possible
Matrix Completion
$$[\mathrm{A} = \mathrm{H}.*\mathrm{L}]$$ with $$\mathrm{H}$$ a known mask, $$\mathrm{L}$$ unknown solve for $$\mathrm{L}$$ lowest rank possible
Stable Principle Component Pursuit (SPCP)/ Noisy Robust PCA
$$[\mathrm{A} = \mathrm{L} + \mathrm{S} + \mathrm{N}]$$ with $$\mathrm{L}$$, $$\mathrm{S}$$, $$\mathrm{N}$$ unknown, solve for $$\mathrm{L}$$ low rank, $$\mathrm{S}$$ sparse, $$\mathrm{N}$$ noise
Robust PCA
$$[\mathrm{A} = \mathrm{L} + \mathrm{S}]$$ with , unknown, solve for $$\mathrm{L}$$ low rank, $$\mathrm{S}$$ sparse
Sparse PCA
$$[\mathrm{A} = \mathrm{D} \mathrm{X} ]$$ with unknown $$\mathrm{D}$$ and $$\mathrm{X}$$, solve for sparse $$\mathrm{D}$$
Dictionary Learning
$$[\mathrm{A} = \mathrm{D} \mathrm{X}]$$ with unknown $$\mathrm{D}$$ and $$\mathrm{X}$$, solve for sparse $$\mathrm{X}$$
Archetypal Analysis
$$[\mathrm{A} = \mathrm{D} \mathrm{X}]$$ with unknown $$\mathrm{D}$$ and $$\mathrm{X}$$, solve for $$\mathrm{D}= \mathrm{A} \mathrm{B}$$ with $$\mathrm{D}$$, $$\mathrm{B}$$ positive
Matrix Compressive Sensing (MCS)
find a rank-r matrix $$\mathrm{L}$$ such that $$[\mathrm{A}(\mathrm{L}) ~= b]$$ / or $$[\mathrm{A}(\mathrm{L}+\mathrm{S}) = b]$$
Multiple Measurement Vector (MMV)
$$[\mathrm{Y} = \mathrm{A} \mathrm{X}]$$ with unknown $$\mathrm{X}$$ and rows of $$\mathrm{X}$$ are sparse
Compressed sensing
$$[\mathrm{Y} = \mathrm{A} \mathrm{X}]$$ with unknown $$\mathrm{X}$$ and rows of $$\mathrm{X}$$ are sparse, $$\mathrm{X}$$ is one column.
Blind Source Separation (BSS)
$$[\mathrm{Y} = \mathrm{A} \mathrm{X}]$$ with unknown $$\mathrm{A}$$ and $$\mathrm{X}$$ and statistical independence between columns of $$\mathrm{X}$$ or subspaces of columns of $$\mathrm{X}$$
Partial and Online SVD/PCA
Tensor Decomposition
Many, many options; see tensor decompositions for some tractable ones.

Truncated Classic PCA is clearly also an example, but is excluded from the list for some reason. Boringness? the fact it’s a special case of Sparse PCA?

Square root
$$\mathrm{Y} = \mathrm{X}^{\top}\mathrm{X}$$ for $$\mathrm{Y}\in\mathbb{R}^{N\times N}, X\in\mathbb{R}^{N\times n}$$, with (typically) $$n<N$$.

That is a whole thing; see matrix square root.

## 5 Why is approximate factorisation ever useful?

For certain types of data matrix, here is a suggestive observation: Udell and Townsend (2019) ask “Why Are Big Data Matrices Approximately Low Rank?”

Matrices of (approximate) low rank are pervasive in data science, appearing in movie preferences, text documents, survey data, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention paid to explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an $$m\times n$$ matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as $$\mathcal{O}(\log(m+n))$$. Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.

Ethan Epperly argues from a function approximation perspective (e.g.) that we can deduce this property from smoothness of functons.

Saul (2023) connects non-negative matrix factorisation to geometric algebra and linear algebra via deep learning and kernels. That sounds like fun.

## 6 As regression

Total Least Squares (a.k.a. orthogonal distance regression, or error-in-variables least-squares linear regression) is a low-rank matrix approximation that minimises the Frobenius divergence from the data matrix. Who knew?

Various other dimensionality reduction techniques can be put in a regression framing, notable Exponential-family PCA.

## 7 Sketching

“Sketching” is a common term to describe a certain type of low-rank factorisation, although I am not sure which types. 🏗

mentions CUR and interpolative decompositions. What is that now?

## 8 Randomization

Rather than find an optimal solution, why not just choose a random one which might be good enough? There are indeed randomised versions, and many algorithms are implemented using randomness and in particular low-dimensional projections.

## 9 Connections to kernel learning

See for a mind-melting compositional matrix factorization diagram, constructing a search over hierarchical kernel decompositions that also turn out to have some matrix factorisation interpretations.

## 10 Bayesian

Nakajima and Sugiyama (2012):

Mnih and Salakhutdinov (2008) proposed a Bayesian maximum a posteriori (MAP) method based on the Gaussian noise model and Gaussian priors on the decomposed matrices. This method actually corresponds to minimizing the squared-loss with the trace-norm penalty Recently, the variational Bayesian (VB) approach has been applied to MF , which we refer to as VBMF. The VBMF method was shown to perform very well in experiments. However, its good performance was not completely understood beyond its experimental success.

☜ Insert further developments here. Possibly Brouwer’s thesis or Shakir Mohamed’s would be a good start, or Benjamin Drave’s tutorial, Probabilistic Matrix Factorization and Xinghao Ding, Lihan He, and Carin (2011).

I am currently sitting in a seminar by He Zhao on Bayesian matrix factorisation, wherein he is building up this tool for discrete data, which is an interesting case. He starts from M. Zhou et al. (2012) and builds up to Zhao et al. (2018), introducing some hierarchical descriptions along the way. His methods seem to be sampling-based rather than variational (?).

Generalized² Linear² models unify nonlinear matrix factorisations with Generalized Linear Models. I had not heard of that until recently; I wonder how common it is?

## 11 Lanczos decomposition

Lanczos decomposition is handy approximation for matrices which are cheap to multiply because of some structure, but expensive to store. It can also be used to calculate an approximate inverse cheaply.

I learnt this trick from Gaussian process literature in the context of Lanczos Variance Estimates (LOVE) , although I believe it exists elsewhere.

Given some rank $$k$$ and an arbitrary starting vector $$\boldsymbol{b}$$, the Lanczos algorithm iteratively approximates $$\mathrm{K} \in\mathbb{R}^{n \times n}$$ by a low rank factorisation $$\mathrm{K}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}$$, where $$\mathrm{T} \in \mathbb{R}^{k \times k}$$ is tridiagonal and $$\mathrm{Q} \in \mathbb{R}^{n \times k}$$ has orthogonal columns. Crucially, we do not need to form $$\mathrm{K}$$ to evaluate matrix vector products $$\mathrm{K}\boldsymbol{b}$$ for arbitrary vector $$\boldsymbol{b}$$. Moreover, with a given Lanczos approximand $$\mathrm{Q},\mathrm{T}$$ we may estimate \begin{align*} \mathrm{K}^{-1}\boldsymbol{c}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\boldsymbol{c}. \end{align*} even for $$\boldsymbol{b}\neq\boldsymbol{c}$$. Say we wish to calculate $$\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)^{-1}\mathrm{B}$$, with $$\mathrm{Z}\in\mathbb{R}^{D\times N }$$ and $$N\ll D$$.

We approximate the solution to this linear system using the partial Lanczos decomposition starting with probe vector $$\boldsymbol{b}=\overline{\mathrm{B}}$$ and $$\mathrm{K}=\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)$$.

This requires $$k$$ matrix vector products of the form \begin{align*} \underbrace{\left(\underbrace{\mathrm{Z} \mathrm{Z}^{\top}}_{\mathcal{O}(ND^2)}+\sigma^2 \mathrm{I}\right)\boldsymbol{b}}_{\mathcal{O}(D^2)} =\underbrace{\mathrm{Z} \underbrace{(\mathrm{Z}^{\top}\boldsymbol{b})}_{\mathcal{O}(ND)}}_{\mathcal{O}(ND)} +\sigma^2 \boldsymbol{b}. \end{align*} Using the latter representation, the required matrix-vector product may be found with a time complexity cost of $$\mathcal{O}(ND)$$. Space complexity is also $$\mathcal{O}(ND)$$. The output of the Lanczos decomposition is $$\mathrm{Q},\mathrm{T}$$ such that $$\left(\mathrm{Z}\mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)\boldsymbol{b}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\boldsymbol{b}$$. Then the solution to the inverse-matrix-vector product may be approximated by $$\left(\mathrm{Z} \mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)^{-1} \mathrm{B}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\mathrm{B}$$. requiring the solution in $$\mathrm{X}$$ of the much smaller linear system $$\mathrm{X}\mathrm{T}=\mathrm{Q}$$. Exploiting the positive-definiteness of $$\mathrm{T}$$ we may use the Cholesky decomposition of $$\mathrm{T}=\mathrm{L}^{\top}\mathrm{L}$$ for a constant speedup over solving an arbitrary linear system. The time cost of the solution is $$\mathcal{O}(Dk^3)$$, for an overall cost to the matrix inversions of $$\mathcal{O}(NDk+Dk^3)$$.

Lifehack: Find derivatives of Lanczos iterations via pnkraemer/matfree: Matrix-free linear algebra in JAX. .

## 12 Estimating rank

Overview — imate Manual

Eigencount and Numerical Rank

If $$f: \lambda \mapsto \mathbf{1}_{[a, b]}(\lambda)$$ is the indicator (step) function in the interval $$[a, b]$$, then $$\operatorname{trace}(\mathbf{1}(\mathbf{A}))$$ estimates the number of non-zero eigenvalues of $$\mathbf{A}$$ in that interval, which is an inexpensive method to estimate the rank of a large matrix. Eigencount is closely related to the Principal Component Analysis (PCA) and lowrank approximations in machine learning.

## 13 Incremental decompositions

### 13.1 SVD

See incremental SVD.

## 14 Low rank plus diagonal

Specifically $$(\mathrm{K}=\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2\mathrm{I})$$ where $$\mathrm{K}\in\mathbb{R}^{N\times N}$$ and $$\mathrm{Z}\in\mathbb{R}^{N\times D}$$ with $$D\ll N$$. A workhorse.

Lots of fun tricks.

## 16 As an optimisation problem

There are some generalised optimisation problems which look useful for matrix factorisation, e.g. Bhardwaj, Klep, and Magron (2021):

Polynomial optimization problems (POP) are prevalent in many areas of modern science and engineering. The goal of POP is to minimize a given polynomial over a set defined by finitely many polynomial inequalities, a semialgebraic set. This problem is well known to be NP-hard, and has motivated research for more practical methods to obtain approximate solutions with high accuracy.[…]

One can naturally extend the ideas of positivity and sums of squares to the noncommutative (nc) setting by replacing the commutative variables $$z_1, \dots , z_n$$ with noncommuting letters $$x_1, \dots , x_n$$. The extension to the noncommutative setting is an inevitable consequence of the many areas of science which regularly optimize functions with noncommuting variables, such as matrices or operators. For instance in control theory, matrix completion, quantum information theory, or quantum chemistry

Matrix calculus can help cometimes.

## 17$$[\mathcal{H}]$$-matrix methods

It seems like low-rank matrix factorisation could related to $$[\mathcal{H}]$$-matrix methods, but I do not know enough to say more.

See hmatrix.org for one lab’s backgrounder and their implementation, h2lib, hlibpro for a black-box closed-source one.

## 18 Tools

In pytorch, various operations are made easier with cornellius-gp/linear_operator.

Ameli’s tools:

NMF Toolbox (MATLAB and Python):

Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants—in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.

Vowpal Wabbit factors matrices, e.g for recommender systems. It seems the --qr version is more favoured.

HPC for matlab, R, python, c++: libpmf:

LIBPMF implements the CCD++ algorithm, which aims to solve large-scale matrix factorization problems such as the low-rank factorization problems for recommender systems.

Spams (C++/MATLAB/python) includes some matrix factorisations in its sparse approx toolbox. (see optimisation)

scikit-learn (python) does a few matrix factorisation in its inimitable batteries-in-the-kitchen-sink way.

nimfa

… is a Python library for nonnegative matrix factorization. It includes implementations of several factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.”

Tapkee (C++). Pro-tip — even without coding C++, tapkee does a long list of dimensionality reduction from the CLI.

• PCA and randomized PCA
• Kernel PCA (kPCA)
• Random projection
• Factor analysis

tensorly supports some interesting tensor decompositions.

## 19 References

Aarabi, and Peeters. 2018. In Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion. AM’18.
Abdallah, and Plumbley. 2004. In.
Achlioptas. 2003. Journal of Computer and System Sciences, Special Issue on PODS 2001,.
Aghasi, Nguyen, and Romberg. 2016. arXiv:1611.05162 [Cs, Stat].
Akimoto. 2017.
Ameli, and Shadden. 2023. Applied Mathematics and Computation.
Ang, and Gillis. 2018. Neural Computation.
Arora, Ge, Halpern, et al. 2012. arXiv:1212.4777 [Cs, Stat].
Attias. 1999. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. UAI’99.
Babacan, Luessi, Molina, et al. 2012. IEEE Transactions on Signal Processing.
Bach, Francis R. 2013. In COLT.
Bach, Francis. 2013. arXiv:1309.3117 [Cs, Math].
Bach, C, Ceglia, Song, et al. 2019. International Journal for Numerical Methods in Engineering.
Bach, Francis, Jenatton, and Mairal. 2011. Optimization With Sparsity-Inducing Penalties. Foundations and Trends(r) in Machine Learning 1.0.
Bach, Francis R, and Jordan. 2002. “Kernel Independent Component Analysis.” Journal of Machine Learning Research.
Bagge Carlson. 2018.
Barbier, Macris, and Miolane. 2017. arXiv:1709.10368 [Cond-Mat, Physics:math-Ph].
Bardenet, and Maillard. 2015.
Batson, Spielman, and Srivastava. 2008. arXiv:0808.0163 [Cs].
Bauckhage. 2015. arXiv:1512.07548 [Stat].
Berry, Browne, Langville, et al. 2007. Computational Statistics & Data Analysis.
Bertin, Badeau, and Vincent. 2010. IEEE Transactions on Audio, Speech, and Language Processing.
Bhardwaj, Klep, and Magron. 2021.
Brand. 2002. In Computer Vision — ECCV 2002.
———. 2006. Linear Algebra and Its Applications, Special Issue on Large Scale Linear and Nonlinear Eigenvalue Problems,.
Bruckstein, Elad, and Zibulevsky. 2008a. In 3rd International Symposium on Communications, Control and Signal Processing, 2008. ISCCSP 2008.
———. 2008b. IEEE Transactions on Information Theory.
Buch, Quinton, and Sturm. 2017. “NichtnegativeMatrixFaktorisierungnutzendesKlangsynthesenSystem (NiMFKS): Extensions of NMF-Based Concatenative Sound Synthesis.” In Proceedings of the 20th International Conference on Digital Audio Effects.
Bunch, and Nielsen. 1978. Numerische Mathematik.
Caetano, and Rodet. 2013. IEEE Transactions on Audio, Speech, and Language Processing.
Cao, Shen, Sun, et al. 2007. In Proceedings of the 20th International Joint Conference on Artifical Intelligence. IJCAI’07.
Carabias-Orti, Virtanen, Vera-Candeas, et al. 2011. IEEE Journal of Selected Topics in Signal Processing.
Chen, and Chi. 2018. IEEE Signal Processing Magazine.
Chi, Lu, and Chen. 2019. IEEE Transactions on Signal Processing.
Cichocki, Lee, Oseledets, et al. 2016. arXiv:1609.00893 [Cs].
Cichocki, Zdunek, and Amari. 2006. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
Cohen, Daubechies, and Feauveau. 1992. Communications on Pure and Applied Mathematics.
Combettes, and Pesquet. 2008. Inverse Problems.
Dasarathy, Shah, Bhaskar, et al. 2013. arXiv:1303.6544 [Cs, Math].
Dasgupta, and Gupta. 2003. Random Structures & Algorithms.
Davis. 1963. Journal of Mathematical Analysis and Applications.
Defferrard, Bresson, and Vandergheynst. 2016. In Advances In Neural Information Processing Systems.
Del Moral, and Niclas. 2018.
Desai, Ghashami, and Phillips. 2016. IEEE Transactions on Knowledge and Data Engineering.
Devarajan. 2008. PLoS Comput Biol.
Ding, He, and Simon. 2005. In Proceedings of the 2005 SIAM International Conference on Data Mining. Proceedings.
Ding, Li, and Jordan. 2010. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Dokmanić, and Gribonval. 2017. arXiv:1706.08701 [Cs, Math].
Dolcetti, and Pertici. 2020.
Driedger, and Pratzlich. 2015. In Proceedings of ISMIR.
Drineas, Kannan, and Mahoney. 2006a. SIAM Journal on Computing.
———. 2006b. SIAM Journal on Computing.
Drineas, and Mahoney. 2005. Journal of Machine Learning Research.
Dueck, Morris, and Frey. 2005. Bioinformatics.
Eaton. 2007a. In Institute of Mathematical Statistics Lecture Notes - Monograph Series.
———. 2007b. Multivariate statistics: a vector space approach. Lecture notes-monograph series / Institute of Mathematical Statistics 53.
Ellis, and Lay. 1992. Linear Algebra and Its Applications.
Fairbanks, Kannan, Park, et al. 2015. Parallel Computing, Graph analysis for scientific discovery,.
Fasi, Higham, and Liu. 2023. SIAM Journal on Matrix Analysis and Applications.
Févotte, Bertin, and Durrieu. 2008. Neural Computation.
Flammia, Gross, Liu, et al. 2012. New Journal of Physics.
Fung, Hariharan, Harvey, et al. 2011. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing. STOC ’11.
Gemulla, Nijkamp, Haas, et al. 2011. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’11.
Ghashami, Liberty, Phillips, et al. 2015. arXiv:1501.01711 [Cs].
Gordon. 2002. In Proceedings of the 15th International Conference on Neural Information Processing Systems. NIPS’02.
Gross, D. 2011. IEEE Transactions on Information Theory.
Grosse, Salakhutdinov, Freeman, et al. 2012. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Gross, David, Liu, Flammia, et al. 2010. Physical Review Letters.
Guan, Naiyang, Tao, Luo, et al. 2012. IEEE Transactions on Signal Processing.
Guan, N., Tao, Luo, et al. 2012. IEEE Transactions on Neural Networks and Learning Systems.
Gu, and Eisenstat. 1993. “A Stable and Fast Algorithm for Updating the Singular Value Decomposition.”
———. 1995. SIAM Journal on Matrix Analysis and Applications.
Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Springer Series in Computational Mathematics 49.
Halko, Martinsson, Shkolnisky, et al. 2011. SIAM Journal on Scientific Computing.
Halko, Martinsson, and Tropp. 2010.
Harbrecht, Peters, and Schneider. 2012. Applied Numerical Mathematics, Third Chilean Workshop on Numerical Analysis of Partial Differential Equations (WONAPDE 2010),.
Harville. 1976. The Annals of Statistics.
———. 1977. Journal of the American Statistical Association.
Hassanieh, Haitham, Indyk, Katabi, et al. 2012. In Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing. STOC ’12.
Hassanieh, H., Indyk, Katabi, et al. 2012. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. Proceedings.
Hastie, Mazumder, Lee, et al. 2015. In Journal of Machine Learning Research.
Heinig, and Rost. 2011. Linear Algebra and Its Applications.
Henderson, and Searle. 1981. SIAM Review.
Hoffman, Matthew, Bach, and Blei. 2010. In Advances in Neural Information Processing Systems.
Hoffman, Matthew D, Blei, and Cook. 2010. In International Conference on Machine Learning.
Hoyer. 2002. In Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002.
Hsieh, and Dhillon. 2011. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’11.
Huang, Kaess, and Leonard. 2013. In 2013 European Conference on Mobile Robots (ECMR).
Hu, Pehlevan, and Chklovskii. 2014. In 2014 48th Asilomar Conference on Signals, Systems and Computers.
Huynh, Zhao, and Phung. 2020. In Advances in Neural Information Processing Systems.
Iliev, Stanev, Vesselinov, et al. 2018. PLOS ONE.
Ji, Dunson, and Carin. 2009. IEEE Transactions on Signal Processing.
Kannan. 2016.
Kannan, Ballard, and Park. 2016. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP ’16.
Keriven, Bourrier, Gribonval, et al. 2016. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Keshava. 2003. Lincoln Laboratory Journal.
Khoromskij, Litvinenko, and Matthies. 2009. Computing.
Kim, and Park. 2008. SIAM Journal on Matrix Analysis and Applications.
Koren, Bell, and Volinsky. 2009. Computer.
Koutis, Miller, and Peng. 2012. Communications of the ACM.
Krämer, Moreno-Muñoz, Roy, et al. 2024.
Kruskal. 1964. Psychometrika.
Kumar, and Shneider. 2016. arXiv:1606.06511 [Cs, Math].
Lahiri, Gao, and Ganguli. 2016. arXiv:1607.04331 [Cs, q-Bio, Stat].
Lawrence, and Urtasun. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.
Lee, and Seung. 1999. Nature.
———. 2001. In Advances in Neural Information Processing Systems 13.
Liberty. 2013. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13.
Liberty, Woolfe, Martinsson, et al. 2007. Proceedings of the National Academy of Sciences.
Li, S.Z., Hou, Zhang, et al. 2001. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Lim, and Teh. 2007. “Variational Bayesian Approach to Movie Rating Prediction.” In Proceedings of KDD Cup and Workshop.
Lin, Chih-Jen. 2007. Neural Computation.
Lin, Zhouchen. 2016.
Li, Chi-Kwong, and Poon. 2002. Linear and Multilinear Algebra.
Liu, T., and Tao. 2015. IEEE Transactions on Neural Networks and Learning Systems.
Liu, Tongliang, Tao, and Xu. 2016. arXiv:1601.00238 [Cs, Stat].
López-Serrano, Dittmar, Özer, et al. 2019. “NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization.” In.
Mahoney. 2010. Randomized Algorithms for Matrices and Data.
Mailhé, Gribonval, Vandergheynst, et al. 2011. Signal Processing, Advances in Multirate Filter Bank Structures and Multiscale Representations,.
Mairal, Bach, Ponce, et al. 2009. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.
———, et al. 2010. The Journal of Machine Learning Research.
Mairal, Bach, and Ponce. 2014. Sparse Modeling for Image and Vision Processing.
Martinsson. 2016. arXiv:1607.01649 [Math].
Martinsson, Rockhlin, and Tygert. 2006.
Mensch, Mairal, Thirion, et al. 2017. arXiv:1701.05363 [Math, q-Bio, Stat].
Minka. 2000. Old and new matrix algebra useful for statistics.
Mnih, and Salakhutdinov. 2008. “Probabilistic Matrix Factorization.” Advances in Neural Information Processing Systems.
Mohamed. 2011.
Nakajima, and Sugiyama. 2012. “Theoretical Analysis of Bayesian Matrix Factorization.” Journal of Machine Learning Research.
Nakatsukasa. 2019.
Needell, and Vershynin. 2009. Foundations of Computational Mathematics.
Nowak, and Litvinenko. 2013. Mathematical Geosciences.
Okajima, and Kabashima. 2021.
Oymak, and Tropp. 2015. arXiv:1511.09433 [Cs, Math, Stat].
Paatero, and Tapper. 1994. Environmetrics.
Pan, Zhang, Wu, et al. 2014. PLoS ONE.
Petersen, and Pedersen. 2012.
Pleiss, Gardner, Weinberger, et al. 2018. In.
Pleiss, Jankowiak, Eriksson, et al. 2020. Advances in Neural Information Processing Systems.
Rabani, and Toledo. 2001. In PPSC.
Raiko, Ilin, and Karhunen. 2007. In Machine Learning: ECML 2007.
Rokhlin, Szlam, and Tygert. 2009. SIAM J. Matrix Anal. Appl.
Rokhlin, and Tygert. 2008. Proceedings of the National Academy of Sciences.
Ryabko, and Ryabko. 2010. IEEE Transactions on Information Theory.
Sachdeva, Sushant. 2013. Foundations and Trends® in Theoretical Computer Science.
Sachdeva, Noveen, Dhaliwal, Wu, et al. 2022.
Salakhutdinov, and Mnih. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Sarwar, Karypis, Konstan, et al. 2002. “Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems.”
Saul. 2023. Transactions on Machine Learning Research.
Schmidt, Larsen, and Hsiao. 2007. In 2007 IEEE Workshop on Machine Learning for Signal Processing.
Seeger, ed. 2004. Low Rank Updates for the Cholesky Decomposition.
Seshadhri, Sharma, Stolman, et al. 2020. Proceedings of the National Academy of Sciences.
Shi, Zheng, and Yang. 2017. Entropy.
Singh, and Gordon. 2008. In Machine Learning and Knowledge Discovery in Databases.
Smaragdis. 2004. In Independent Component Analysis and Blind Signal Separation. Lecture Notes in Computer Science.
Soh, and Chandrasekaran. 2017. arXiv:1701.01207 [Cs, Math, Stat].
Song, Sebe, and Wang. 2022. In.
Sorzano, Vargas, and Montano. 2014. arXiv:1403.2877 [Cs, q-Bio, Stat].
Spielman, D., and Srivastava. 2011. SIAM Journal on Computing.
Spielman, Daniel A., and Teng. 2004. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing. STOC ’04.
———. 2006. arXiv:cs/0607105.
———. 2008a. arXiv:0808.4134 [Cs].
———. 2008b. arXiv:0809.3232 [Cs].
Sra, and Dhillon. 2006. In Advances in Neural Information Processing Systems 18.
Srebro, Rennie, and Jaakkola. 2004. In Advances in Neural Information Processing Systems. NIPS’04.
Stewart. 2000. Computing in Science Engineering.
Sundin. 2016. “Bayesian Methods for Sparse and Low-Rank Matrix Problems.”
Sun, and Stein. 2016. Journal of Computational and Graphical Statistics.
Tropp, Yurtsever, Udell, et al. 2016. arXiv:1609.00048 [Cs, Math, Stat].
———, et al. 2017. SIAM Journal on Matrix Analysis and Applications.
Tufts, and Kumaresan. 1982. Proceedings of the IEEE.
Tung, and Little. n.d.
Türkmen. 2015. arXiv:1507.03194 [Cs, Stat].
Turner, and Sahani. 2014. IEEE Transactions on Signal Processing.
Udell, and Townsend. 2019. SIAM Journal on Mathematics of Data Science.
Vaz, Toutios, and Narayanan. 2016. In.
Vincent, Bertin, and Badeau. 2008. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
Virtanen. 2007. IEEE Transactions on Audio, Speech, and Language Processing.
Vishnoi. 2013. Lx = b.” Foundations and Trends® in Theoretical Computer Science.
Vo, Vo, and Hoang. 2017. arXiv:1606.08350 [Stat].
Wager, Chen, Kim, et al. 2017. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Wang, Shusen, Gittens, and Mahoney. 2017. arXiv:1702.04837 [Cs, Stat].
Wang, Boyue, Hu, Gao, et al. 2017. In PRoceedings of IJCAI, 2017.
Wang, Yuan, and Jia. 2004. “Fisher Non-Negative Matrix Factorization for Learning Local Features.” In In Proc. Asian Conf. On Comp. Vision.
Wang, Sinong, Li, Khabsa, et al. 2020.
Wang, Y. X., and Zhang. 2013. IEEE Transactions on Knowledge and Data Engineering.
Wilkinson, Andersen, Reiss, et al. 2019. arXiv:1901.11436 [Cs, Eess, Stat].
Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science 1.0.
Woolfe, Liberty, Rokhlin, et al. 2008. Applied and Computational Harmonic Analysis.
Xinghao Ding, Lihan He, and Carin. 2011. IEEE Transactions on Image Processing.
Yang, Linxiao, Fang, Duan, et al. 2018. IEEE Transactions on Signal Processing.
Yang, Jiyan, Meng, and Mahoney. 2015. arXiv:1502.03032 [Cs, Math, Stat].
Yang, Wenzhuo, and Xu. 2015. In Journal of Machine Learning Research.
Ye, and Lim. 2016. Foundations of Computational Mathematics.
Yin, Gao, and Lin. 2016. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Yoshii. 2013. “Beyond NMF: Time-Domain Audio Source Separation Without Phase Reconstruction.”
Yu, Hsiang-Fu, Hsieh, Si, et al. 2012. In IEEE International Conference of Data Mining.
———, et al. 2014. Knowledge and Information Systems.
Yu, Chenhan D., March, and Biros. 2017. In arXiv:1701.02324 [Cs].
Yurtsever, Tropp, Fercoq, et al. 2021. SIAM Journal on Mathematics of Data Science.
Zass, and Shashua. 2005. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01. ICCV ’05.
Zhang, Yangwen. 2022.
Zhang, Zhongyuan, Ding, Li, et al. 2007. In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007.
Zhang, Kai, Liu, Zhang, et al. 2017. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17.
Zhang, Xiao, Wang, and Gu. 2017. arXiv:1701.00481 [Stat].
Zhao, Du, Buntine, et al. 2018. In Proceedings of the 35th International Conference on Machine Learning.
Zhao, Phung, Huynh, et al. 2020. In.
Zhou, Mingyuan, Chen, Paisley, et al. 2009. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. NIPS’09.
Zhou, Mingyuan, Hannah, Dunson, et al. 2012. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics.
Zhou, Tianyi, and Tao. 2011. In Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11.
———. 2012. Journal of Machine Learning Research.
Zitnik, and Zupan. 2018. arXiv:1808.01743 [Cs, q-Bio, Stat].