## The classics

The *big six* exact matrix deompositions are (Stewart 2000)
Cholesky decomposition; pivoted LU decomposition; QR decomposition; spectral decomposition; Schur decomposition; and singular value decomposition.

See Nick Highamโs summary of those.

## Approximate decompositions

Mastered QR and LU decompositions? There are now so many ways of factorising matrices that there are not enough acronyms in the alphabet to hold them, especially if you suspect your matrix is sparse, or could be made sparse because of some underlying constraint, or probably could, if squinted at in the right fashion, be such as a graph transition matrix, or Laplacian, or noisy transform of some smooth object, or at least would be close to sparse if you chose the right metric, orโฆ

Your big matrix is close to, in some sense, the (tensor/matrix) product (or sum, orโฆ) of some matrices that are in some way simple (small-rank, small dimension, sparse), possibly with additional constraints. Can you find these simple matrices?

Hereโs an example:
Godec โ A decomposition into low-rank *and* sparse components which loosely speaking, combines multidimensional factorisation and outlier detection.

There are so many more of these things, depending on your preferred choice of loss function, free variables and such.

Keywords: Matrix sketching, low-rank approximation, traditional dimensionality reduction.

Matrix concentration inequalities turn out to a useful tool that relates here.

I would like to learn more about

- sparse or low-rank matrix approximation as clustering for density estimation, which is how I imagine high-dimensional mixture models would need to work, and thereby also
- Mercer kernel approximation.
- Connection to manifold learning is also probably worth examining.

Igor Carronโs Matrix Factorization Jungle classifies the following problems as matrix-factorisation type.

- Kernel Factorizations
- โฆ
- Spectral clustering
- \([A = DX]\) with unknown D and X, solve for sparse X and X_i = 0 or 1
- K-Means / K-Median clustering
- \([A = DX]\) with unknown D and X, solve for XX^{} = I and X_i = 0 or 1
- Subspace clustering
- \([A = AX]\) with unknown X, solve for sparse/other conditions on X
- Graph Matching
- \([A = XBX^{\TOP}]\) with unknown X, B solve for B and X as a permutation
- NMF
- \([A = DX]\) with unknown D and X, solve for elements of D,X positive
- Generalized Matrix Factorization
- \([W.*L โ W.*UV']\) with W a known mask, U,V unknowns solve for U,V and L lowest rank possible
- Matrix Completion
- \([A = H.*L]\) with H a known mask, L unknown solve for L lowest rank possible
- Stable Principle Component Pursuit (SPCP)/ Noisy Robust PCA
- \([A = L + S + N]\) with L, S, N unknown, solve for L low rank, S sparse, N noise
- Robust PCA
- \([A = L + S]\) with L, S unknown, solve for L low rank, S sparse
- Sparse PCA
- \([A = DX]\) with unknown D and X, solve for sparse D
- Dictionary Learning
- \([A = DX]\) with unknown D and X, solve for sparse X
- Archetypal Analysis
- \([A = DX]\) with unknown D and X, solve for D = AB with D, B positive
- Matrix Compressive Sensing (MCS)
- find a rank-r matrix L such that \([A(L) ~= b]\) / or \([A(L+S) = b]\)
- Multiple Measurement Vector (MMV)
- \([Y = A X]\) with unknown X and rows of X are sparse
- Compressed sensing
- \([Y = A X]\) with unknown X and rows of X are sparse, X is one column.
- Blind Source Separation (BSS)
- \([Y = A X]\) with unknown A and X and statistical independence between columns of X or subspaces of columns of X
- Partial and Online SVD/PCA
- โฆ
- Tensor Decomposition
- Many, many options; see tensor decompositions for some tractable ones.

Truncated Classic PCA is clearly also an example, but is excluded from the list for some reason. Boringness? the fact itโs a special case of Sparse PCA?

I also add

- Square root
- \(Y = X^{\top}X\) for \(Y\in\mathbbb{R}^{N\times N}, X\in\mathbbb{R}^{N\times n}\), with (typically) \(n<N\).

See also learning on manifolds, compressed sensing, optimisation random linear algebra and clustering, sparse regressionโฆ

## Why is approximate factorisation ever useful?

For certain types of data matrix, here is a suggestive observation: Udell and Townsend (2019) ask โWhy Are Big Data Matrices Approximately Low Rank?โ

Matrices of (approximate) low rank are pervasive in data science, appearing in movie preferences, text documents, survey data, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention paid to explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an \(m\times n\) matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as \(\mathcal{O}(\log(m+n))\). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.

## Tutorials

- Data mining seminar: Matrix sketching
- Kumar and Schneider have a literature survey on low rank approximation of matrices (Kumar and Shneider 2016)
- Preconditioning tutorial by Erica Klarreich
- Andrew McGregorโs ICML Tutorial Streaming, sampling, sketching
- more at signals and graph.
- Another one that makes the link to clustering is Chris Dingโs Principal Component Analysis and Matrix Factorizations for Learning
- Igor Carronโs Advanced Matrix Factorization Jungle.

## Non-negative matrix factorisations

## As regression

Total Least Squares (a.k.a. orthogonal distance regression, or error-in-variables least-squares linear regression) is a low-rank matrix approximation that minimises the Frobenius divergence from the data matrix. Who knew?

Various other dimensionality reduction techniques can be put in a regression framing, notable Exponential-family PCA.

## Sketching

โSketchingโ is a common term to describe a certain type of low-rank factorisation, although I am not sure which types. ๐

(Martinsson 2016) mentions CUR and interpolative decompositions. Does preconditioning fit ?

## \([\mathcal{H}]\)-matrix methods

It seems like low-rank matrix factorisation could related to \([\mathcal{H}]\)-matrix methods, as seen in, e.g. covariance matrices, but I do not know enough to say more.

See hmatrix.org for one labโs backgrounder and their implementation, h2lib, hlibpro for a black-box closed-source one.

## Randomized methods

Rather than find an optimal solution, why not just choose a random one which might be good enough? There are indeed randomised versions.

## Connections to kernel learning

See (Grosse et al. 2012) for a mind-melting compositional matrix factorization diagram, constructing a search over hierarchical kernel decompositions that also turn out to have some matrix factorisation interpretations.

## Bayesian

Nakajima and Sugiyama (2012):

Mnih and Salakhutdinov (2008) proposed a Bayesian maximum a posteriori (MAP) method based on the Gaussian noise model and Gaussian priors on the decomposed matrices. This method actually corresponds to minimizing the squared-loss with the trace-norm penalty (Srebro, Rennie, and Jaakkola 2004) Recently, the variational Bayesian (VB) approach (Attias 1999) has been applied to MF (Lim and Teh 2007; Raiko, Ilin, and Karhunen 2007), which we refer to as VBMF. The VBMF method was shown to perform very well in experiments. However, its good performance was not completely understood beyond its experimental success.

โ Insert further developments here. Possibly Brouwerโs thesis (Brouwer 2017) or Shakir Mohamedโs (Mohamed 2011) would be a good start, or Benjamin Draveโs tutorial, Probabilistic Matrix Factorization and Xinghao Ding, Lihan He, and Carin (2011).

I am currently sitting in a seminar by He Zhao on Bayesian matrix factorisation, wherein he is building up this tool for discrete data, which is an interesting case. He starts from M. Zhou et al. (2012) and builds up to Zhao et al. (2018), introducing some hierarchical descriptions along the way. His methods seem to be sampling-based rather than variational (?).

Generalizedยฒ Linearยฒ models (Gordon 2002) unify nonlinear matrix factorisations with Generalized Linear Models. I had not heard of that until recently; I wonder how common it is?

## Implementations

โEnough theory! Plug this algorithm into my data!โ

OK.

NMF Toolbox (MATLAB and Python)

Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variantsโin particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.

Vowpal Wabbit does this, e.g for recommender systems.
It seems the `--qr`

version is more favoured.

HPC for matlab, R, python, c++: libpmf:

LIBPMF implements the CCD++ algorithm, which aims to solve large-scale matrix factorization problems such as the low-rank factorization problems for recommender systems.

NMF (R) ๐

Matlab: Chih-Jen Linโs nmf.m - โThis tool solves NMF by alternative non-negative least squares using projected gradients. It converges faster than the popular multiplicative update approach. โ

In this repository, we offer both MPI and OPENMP implementation for MU, HALS and ANLS/BPP based NMF algorithms. This can run off the shelf as well easy to integrate in other source code. These are very highly tuned NMF algorithms to work on super computers. We have tested this software in NERSC as well OLCF cluster. The openmp implementation is tested on many different Linux variants with intel processors. The library works well for both sparse and dense matrix. (Fairbanks et al. 2015; Kannan, Ballard, and Park 2016; Kannan 2016)

Spams (C++/MATLAB/python) includes some matrix factorisations in its sparse approx toolbox. (see optimisation)

`scikit-learn`

(python) does
a few matrix factorisation
in its inimitable
batteries-in-the-kitchen-sink way.

nimfa is a Python library for nonnegative matrix factorization. It includes implementations of several factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.โ

Tapkee (C++). Pro-tip โ even without coding C++, tapkee does a long list of dimensionality reduction from the CLI.

- PCA and randomized PCA
- Kernel PCA (kPCA)
- Random projection
- Factor analysis

tensorly supports some interesting tensor decompositions.

## References

*Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion*, 27:1โ7. AMโ18. New York, NY, USA: ACM.

*Journal of Computer and System Sciences*, Special Issue on PODS 2001, 66 (4): 671โ87.

*arXiv:1611.05162 [Cs, Stat]*, November.

*Neural Computation*31 (2): 417โ39.

*arXiv:1212.4777 [Cs, Stat]*, December.

*Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence*, 21โ30. UAIโ99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

*IEEE Transactions on Signal Processing*60 (8): 3964โ77.

*arXiv:1309.3117 [Cs, Math]*, September.

*COLT*, 30:185โ209.

*Journal of Machine Learning Research*3 (July): 48.

*Optimization With Sparsity-Inducing Penalties*. Foundations and Trends(r) in Machine Learning 1.0. Now Publishers Inc.

*arXiv:1709.10368 [Cond-Mat, Physics:math-Ph]*, September.

*arXiv:0808.0163 [Cs]*, August.

*arXiv:1512.07548 [Stat]*, December.

*Computational Statistics & Data Analysis*52 (1): 155โ73.

*IEEE Transactions on Audio, Speech, and Language Processing*18 (3): 538โ49.

*3rd International Symposium on Communications, Control and Signal Processing, 2008. ISCCSP 2008*, 762โ67.

*IEEE Transactions on Information Theory*54 (11): 4813โ20.

*Proceedings of the 20th International Conference on Digital Audio Effects*, 7. Edinburgh.

*IEEE Transactions on Audio, Speech, and Language Processing*21 (8): 1666โ75.

*IEEE Journal of Selected Topics in Signal Processing*5 (6): 1144โ58.

*IEEE Transactions on Signal Processing*67 (20): 5239โ69.

*arXiv:1609.00893 [Cs]*, September.

*2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings*, 5:Vโ.

*Communications on Pure and Applied Mathematics*45 (5): 485โ560.

*Inverse Problems*24 (6): 065014.

*arXiv:1303.6544 [Cs, Math]*, March.

*Random Structures & Algorithms*22 (1): 60โ65.

*Advances In Neural Information Processing Systems*.

*IEEE Transactions on Knowledge and Data Engineering*28 (7): 1678โ90.

*PLoS Comput Biol*4 (7): e1000029.

*Proceedings of the 2005 SIAM International Conference on Data Mining*, 606โ10. Proceedings. Society for Industrial and Applied Mathematics.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*32 (1): 45โ55.

*arXiv:1706.08701 [Cs, Math]*, June.

*Proceedings of ISMIR*, 7. Malaga.

*Journal of Machine Learning Research*6 (December): 2153โ75.

*Bioinformatics*21 (suppl 1): i144โ51.

*Linear Algebra and Its Applications*173 (August): 19โ38.

*Parallel Computing*, Graph analysis for scientific discovery, 47 (August): 38โ50.

*Neural Computation*21 (3): 793โ830.

*New Journal of Physics*14 (9): 095022.

*Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing*, 71โ80. STOC โ11. New York, NY, USA: ACM.

*Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 69โ77. KDD โ11. New York, NY, USA: ACM.

*arXiv:1501.01711 [Cs]*, January.

*Proceedings of the 15th International Conference on Neural Information Processing Systems*, 593โ600. NIPSโ02. Cambridge, MA, USA: MIT Press.

*IEEE Transactions on Information Theory*57 (3): 1548โ66.

*Physical Review Letters*105 (15).

*Proceedings of the Conference on Uncertainty in Artificial Intelligence*.

*IEEE Transactions on Signal Processing*60 (6): 2882โ98.

*IEEE Transactions on Neural Networks and Learning Systems*23 (7): 1087โ99.

*Hierarchical Matrices: Algorithms and Analysis*. 1st ed. Springer Series in Computational Mathematics 49. Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated.

*Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing*, 563โ78. STOC โ12. New York, NY, USA: ACM.

*Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms*, 1183โ94. Proceedings. Kyoto, Japan: Society for Industrial and Applied Mathematics.

*Linear Algebra and Its Applications*435 (1): 1โ59.

*International Conference on Machine Learning*, 8.

*Advances in Neural Information Processing Systems*, 856โ64.

*Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002*, 557โ65.

*Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 1064โ72. KDD โ11. New York, NY, USA: ACM.

*2014 48th Asilomar Conference on Signals, Systems and Computers*.

*2013 European Conference on Mobile Robots (ECMR)*, 150โ57.

*PLOS ONE*13 (3): e0193974.

*IEEE Transactions on Signal Processing*57 (1): 92โ106.

*Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*, 9:1โ11. PPoPP โ16. New York, NY, USA: ACM.

*2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 6190โ94.

*Lincoln Laboratory Journal*14 (1): 55โ78.

*Computing*84 (1-2): 49โ67.

*SIAM Journal on Matrix Analysis and Applications*30 (2): 713โ30.

*Computer*42 (8): 30โ37.

*Communications of the ACM*55 (10): 99โ107.

*Psychometrika*29 (2): 115โ29.

*arXiv:1606.06511 [Cs, Math]*, June.

*arXiv:1607.04331 [Cs, q-Bio, Stat]*, July.

*Proceedings of the 26th Annual International Conference on Machine Learning*, 601โ8. ICML โ09. New York, NY, USA: ACM.

*Nature*401 (6755): 788โ91.

*Advances in Neural Information Processing Systems 13*, edited by T. K. Leen, T. G. Dietterich, and V. Tresp, 556โ62. MIT Press.

*Linear and Multilinear Algebra*50 (4): 321โ26.

*Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001*, 1:I-207-I-212 vol.1.

*Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 581โ88. KDD โ13. New York, NY, USA: ACM.

*Proceedings of the National Academy of Sciences*104 (51): 20167โ72.

*Proceedings of KDD Cup and Workshop*, 7:15โ21. Citeseer.

*Neural Computation*19 (10): 2756โ79.

*arXiv:1601.00238 [Cs, Stat]*, January.

*IEEE Transactions on Neural Networks and Learning Systems*PP (99): 1โ1.

*Randomized Algorithms for Matrices and Data*. Vol. 3.

*Signal Processing*, Advances in Multirate Filter Bank Structures and Multiscale Representations, 91 (12): 2822โ35.

*Sparse Modeling for Image and Vision Processing*. Vol. 8.

*Proceedings of the 26th Annual International Conference on Machine Learning*, 689โ96. ICML โ09. New York, NY, USA: ACM.

*The Journal of Machine Learning Research*11: 19โ60.

*arXiv:1607.01649 [Math]*, July.

*arXiv:1701.05363 [Math, q-Bio, Stat]*, January.

*Advances in Neural Information Processing Systems*, 8.

*Journal of Machine Learning Research*, 66.

*Foundations of Computational Mathematics*9 (3): 317โ34.

*Mathematical Geosciences*45 (4): 411โ35.

*arXiv:1511.09433 [Cs, Math, Stat]*, November.

*Environmetrics*5 (2): 111โ26.

*PLoS ONE*9 (7): e102799.

*Machine Learning: ECML 2007*, edited by Joost N. Kok, Jacek Koronacki, Raomon Lopez de Mantaras, Stan Matwin, Dunja Mladeniฤ, and Andrzej Skowron, 4701:691โ98. Berlin, Heidelberg: Springer Berlin Heidelberg.

*SIAM J. Matrix Anal. Appl.*31 (3): 1100โ1124.

*Proceedings of the National Academy of Sciences*105 (36): 13212โ17.

*IEEE Transactions on Information Theory*56 (3): 1430โ35.

*Proceedings of the 25th International Conference on Machine Learning*, 880โ87. ICML โ08. New York, NY, USA: ACM.

*2007 IEEE Workshop on Machine Learning for Signal Processing*, 431โ36.

*Proceedings of the National Academy of Sciences*117 (11): 5631โ37.

*Entropy*19 (8): 424.

*Machine Learning and Knowledge Discovery in Databases*, 358โ73. Springer.

*Independent Component Analysis and Blind Signal Separation*, edited by Carlos G. Puntonet and Alberto Prieto, 494โ99. Lecture Notes in Computer Science. Granada, Spain: Springer Berlin Heidelberg.

*arXiv:1701.01207 [Cs, Math, Stat]*, January.

*arXiv:1403.2877 [Cs, q-Bio, Stat]*, March.

*Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing*, 81โ90. STOC โ04. New York, NY, USA: ACM.

*arXiv:cs/0607105*, July.

*arXiv:0808.4134 [Cs]*, August.

*arXiv:0809.3232 [Cs]*, September.

*SIAM Journal on Computing*40 (6): 1913โ26.

*Advances in Neural Information Processing Systems 18*, edited by Y. Weiss, B. Schรถlkopf, and J. C. Platt, 283โ90. MIT Press.

*Advances in Neural Information Processing Systems*, 17:1329โ36. NIPSโ04. Cambridge, MA, USA: MIT Press.

*Computing in Science Engineering*2 (1): 50โ59.

*Journal of Computational and Graphical Statistics*25 (1): 187โ208.

*arXiv:1609.00048 [Cs, Math, Stat]*, August.

*SIAM Journal on Matrix Analysis and Applications*38 (4): 1454โ85.

*Proceedings of the IEEE*70 (9): 975โ89.

*arXiv:1507.03194 [Cs, Stat]*, July.

*IEEE Transactions on Signal Processing*62 (23): 6171โ83.

*SIAM Journal on Mathematics of Data Science*1 (1): 144โ60.

*2008 IEEE International Conference on Acoustics, Speech and Signal Processing*, 109โ12.

*IEEE Transactions on Audio, Speech, and Language Processing*15 (3): 1066โ74.

*Foundations and Trendsยฎ in Theoretical Computer Science*8 (1-2): 1โ141.

*arXiv:1606.08350 [Stat]*, February.

*2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 391โ95.

*PRoceedings of IJCAI, 2017*.

*arXiv:1702.04837 [Cs, Stat]*, February.

*IEEE Transactions on Knowledge and Data Engineering*25 (6): 1336โ53.

*In Proc. Asian Conf. On Comp. Vision*, 27โ30.

*arXiv:1901.11436 [Cs, Eess, Stat]*, January.

*Sketching as a Tool for Numerical Linear Algebra*. Foundations and Trends in Theoretical Computer Science 1.0. Now Publishers.

*Applied and Computational Harmonic Analysis*25 (3): 335โ66.

*High-dimensional data analysis with low-dimensional models: Principles, computation, and applications*. S.l.: Cambridge University Press.

*IEEE Transactions on Image Processing*20 (12): 3419โ30.

*arXiv:1502.03032 [Cs, Math, Stat]*, February.

*IEEE Transactions on Signal Processing*66 (11): 2804โ17.

*Journal of Machine Learning Research*, 494โ503.

*Foundations of Computational Mathematics*16 (3): 577โ98.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*38 (3): 504โ17.

*arXiv:1701.02324 [Cs]*.

*IEEE International Conference of Data Mining*, 765โ74.

*Knowledge and Information Systems*41 (3): 793โ819.

*Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCVโ05) Volume 1 - Volume 01*, 294โ301. ICCV โ05. Washington, DC, USA: IEEE Computer Society.

*Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 615โ23. KDD โ17. New York, NY, USA: ACM.

*arXiv:1701.00481 [Stat]*, January.

*Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007*, 391โ400. IEEE.

*Proceedings of the 35th International Conference on Machine Learning*, 5892โ5901. PMLR.

*Proceedings of the 22nd International Conference on Neural Information Processing Systems*, 22:2295โ2303. NIPSโ09. Red Hook, NY, USA: Curran Associates Inc.

*Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics*, 1462โ71. PMLR.

*Journal of Machine Learning Research*.

*arXiv:1808.01743 [Cs, q-Bio, Stat]*, August.

## No comments yet. Why not leave one?