(Approximate) matrix factorisation

Forget QR and LU decompositions, there are now so many ways of factorising matrices that there are not enough acronyms in the alphabet to hold them, especially if you suspect your matrix is sparse, or could be made sparse because of some underlying constraint, or probably could, if squinted at in the right fashion, be such as a graph transition matrix, or Laplacian, or noisy transform of some smooth object, or at least would be very close to sparse if you chose the right metric, or…

Your big matrix is close to, in some sense, the (tensor/matrix) product (or sum, or…) of some matrices that are in some way simple (small-rank, small dimension, sparse), possibly with additional constraints. Can you find these simple matrices?

Here’s an example. Godec --- A decomposition into low-rank and sparse components. (looks combined multidimensional factorisation and outlier detection).

There are so many more of these things, depending on your preferred choice of loss function, free variables and such.

Keywords: Matrix sketching, low-rank approximation, traditional dimensionality reduction.

Matrix concentration inequalities turn out to be useful in making this work.

I would like to learn more about

  • sparse or low-rank matrix approximation as clustering for density estimation, which is how I imagine high-dimensional mixture models would need to work, and thereby also
  • Mercer kernel approximation.
  • Connection to manifold learning is also probably worth examining.

Igor Carron’s Matrix Factorization Jungle classifies the following problems as matrix-factorisation type.

Kernel Factorizations
Spectral clustering
\([A = DX]\) with unknown D and X, solve for sparse X and X_i = 0 or 1
K-Means / K-Median clustering
\([A = DX]\) with unknown D and X, solve for XX^T = I and X_i = 0 or 1
Subspace clustering
\([A = AX]\) with unknown X, solve for sparse/other conditions on X
Graph Matching
\([A = XBX^T]\) with unknown X, B solve for B and X as a permutation
\([A = DX]\) with unknown D and X, solve for elements of D,X positive
Generalized Matrix Factorization
\([W.*L − W.*UV']\) with W a known mask, U,V unknowns solve for U,V and L lowest rank possible
Matrix Completion
\([A = H.*L]\) with H a known mask, L unknown solve for L lowest rank possible
Stable Principle Component Pursuit (SPCP)/ Noisy Robust PCA
\([A = L + S + N]\) with L, S, N unknown, solve for L low rank, S sparse, N noise
Robust PCA
\([A = L + S]\) with L, S unknown, solve for L low rank, S sparse
Sparse PCA
\([A = DX]\) with unknown D and X, solve for sparse D
Dictionary Learning
\([A = DX]\) with unknown D and X, solve for sparse X
Archetypal Analysis
\([A = DX]\) with unknown D and X, solve for D = AB with D, B positive
Matrix Compressive Sensing (MCS)
find a rank-r matrix L such that \([A(L) ~= b]\) / or \([A(L+S) = b]\)
Multiple Measurement Vector (MMV)
\([Y = A X]\) with unknown X and rows of X are sparse
compressed sensing
\([Y = A X]\) with unknown X and rows of X are sparse, X is one column.
Blind Source Separation (BSS)
\([Y = A X]\) with unknown A and X and statistical independence between columns of X or subspaces of columns of X
Partial and Online SVD/PCA
Tensor Decomposition
… **Not sure about this one, but see orthogonally decomposable tensors

Truncated Classic PCA is clearly also an example of this, but is excluded from the list for some reason. Boringness? the fact it’s a special case of Sparse PCA?

See also learning on manifolds, compressed sensing, optimisation random linear algebra and clustering, sparse regression

Why does it ever work

For certain types of data matrix, here is a possibly plausible explanation:

Udell and Townsend (2019) ask “Why Are Big Data Matrices Approximately Low Rank?”

Matrices of (approximate) low rank are pervasive in data science, appearing in movie preferences, text documents, survey data, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention paid to explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an \(m\times n\) matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as \(\mathcal{O}(\log(m+n))\). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.


Non-negative matrix factorisations

See non-negative matrix factorisations.

As regression

Total Least Squares (a.k.a. orthogonal distance regression, or error-in-variables least-squares linear regression) is a low-rank matrix approximation that minimises the Frobenius divergence from the data matrix. Who knew?


“Sketching” is a common term to describe a certain type of low-rank factorisation, although I am not sure which types. 🏗

(Martinsson 2016) mentions CUR and interpolative decompositions. Does preconditioning fit here?

\([\mathcal{H}]\)-matrix methods

It seems like low-rank matrix factorisation could related to \([\mathcal{H}]\)-matrix methods, as seen in, e.g. covariance matrices, but I do not know enough to say more.

See hmatrix.org for one lab’s backgrounder and their implementation, h2lib, hlibpro for a black-box closed-source one.

Randomized methods

Rather than find an optimal solution, why not just choose a random one which might be good enough? There are indeed randomised versions.

Connections to kernel learning

See (Grosse et al. 2012) for a mind-melting compositional matrix factorization diagram, constructing a search over hierarchical kernel decompositions with some matrix factorisation interpretation.

Exploiting compositionality to explore a large space of model structures


“Enough theory! Plug the hip new toy into my algorithm!”


NMF Toolbox (MATLAB and Python)

Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants—in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.

Vowpal Wabbit does this.

HPC for matlab, R, python, c++: libpmf:

LIBPMF implements the CCD++ algorithm, which aims to solve large-scale matrix factorization problems such as the low-rank factorization problems for recommender systems.

NMF (R): 🏗

Laplacians.jl (Julia):

Laplacians is a package containing graph algorithms, with an emphsasis on tasks related to spectral and algebraic graph theory. It contains (and will contain more) code for solving systems of linear equations in graph Laplacians, low stretch spanning trees, sparsifiation, clustering, local clustering, and optimization on graphs.

All graphs are represented by sparse adjacency matrices. This is both for speed, and because our main concerns are algebraic tasks. It does not handle dynamic graphs. It would be very slow to implement dynamic graphs this way.

Matlab: Chih-Jen Lin’s nmf.m - “This tool solves NMF by alternative non-negative least squares using projected gradients. It converges faster than the popular multiplicative update approach. ”

Distributed NMF: In this repository, we offer both MPI and OPENMP implementation for MU, HALS and ANLS/BPP based NMF algorithms. This can run off the shelf as well easy to integrate in other source code. These are very highly tuned NMF algorithms to work on super computers. We have tested this software in NERSC as well OLCF cluster. The openmp implementation is tested on many different Linux variants with intel processors. The library works well for both sparse and dense matrix. (KaBP16_, Kann16_, FKPB15_)

Spams (C++/MATLAB/python) includes some matrix factorisations in its sparse approx toolbox. (see optimisation)

scikit-learn (python) does a few matrix factorisation in its inimitable batteries-in-the-kitchen-sink way.

(python) -

nimfa is a Python library for nonnegative matrix factorization. It includes implementations of several factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.”

Tapkee (C++). Pro-tip — even without coding C++, tapkee does a long list of dimensionality reduction from the CLI.

  • PCA and randomized PCA
  • Kernel PCA (kPCA)
  • Random projection
  • Factor analysis

tensorly supports some interesting tesnor decompositions.

Aarabi, Hadrien Foroughmand, and Geoffroy Peeters. 2018. “Music Retiler: Using NMF2D Source Separation for Audio Mosaicing.” In Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion, 27:1–27:7. AM’18. New York, NY, USA: ACM. https://doi.org/10.1145/3243274.3243299.

Abdallah, Samer A., and Mark D. Plumbley. 2004. “Polyphonic Music Transcription by Non-Negative Sparse Coding of Power Spectra.” In. http://ismir2004.ismir.net/proceedings/p058-page-318-paper216.pdf.

Achlioptas, Dimitris. 2003. “Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences, Special Issue on PODS 2001, 66 (4): 671–87. https://doi.org/10.1016/S0022-0000(03)00025-4.

Aghasi, Alireza, Nam Nguyen, and Justin Romberg. 2016. “Net-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks,” November. http://arxiv.org/abs/1611.05162.

Ang, Andersen Man Shun, and Nicolas Gillis. 2018. “Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation.” Neural Computation 31 (2): 417–39. https://doi.org/10.1162/neco_a_01157.

Arora, Sanjeev, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. 2012. “A Practical Algorithm for Topic Modeling with Provable Guarantees,” December. http://arxiv.org/abs/1212.4777.

Bach, Francis. 2013. “Convex Relaxations of Structured Matrix Factorizations,” September. http://arxiv.org/abs/1309.3117.

Bach, Francis, Rodolphe Jenatton, and Julien Mairal. 2011. Optimization with Sparsity-Inducing Penalties. Foundations and Trends(r) in Machine Learning 1.0. Now Publishers Inc. https://doi.org/10.1561/2200000015.

Bach, Francis R. 2013. “Sharp Analysis of Low-Rank Kernel Matrix Approximations.” In COLT, 30:185–209. http://www.jmlr.org/proceedings/papers/v30/Bach13.pdf.

Bagge Carlson, Fredrik. 2018. “Machine Learning and System Identification for Estimation in Physical Systems.” PhD Thesis TFRT-1122. Thesis/docmono, Lund University. http://lup.lub.lu.se/record/ffb8dc85-ce12-4f75-8f2b-0881e492f6c0.

Barbier, Jean, Nicolas Macris, and Léo Miolane. 2017. “The Layered Structure of Tensor Estimation and Its Mutual Information,” September. http://arxiv.org/abs/1709.10368.

Batson, Joshua, Daniel A. Spielman, and Nikhil Srivastava. 2008. “Twice-Ramanujan Sparsifiers,” August. http://arxiv.org/abs/0808.0163.

Bauckhage, Christian. 2015. “K-Means Clustering Is Matrix Factorization,” December. http://arxiv.org/abs/1512.07548.

Berry, Michael W., Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. 2007. “Algorithms and Applications for Approximate Nonnegative Matrix Factorization.” Computational Statistics & Data Analysis 52 (1): 155–73. https://doi.org/10.1016/j.csda.2006.11.006.

Bertin, N., R. Badeau, and E. Vincent. 2010. “Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription.” IEEE Transactions on Audio, Speech, and Language Processing 18 (3): 538–49. https://doi.org/10.1109/TASL.2010.2041381.

Bruckstein, A. M., Michael Elad, and M. Zibulevsky. 2008a. “Sparse Non-Negative Solution of a Linear System of Equations Is Unique.” In 3rd International Symposium on Communications, Control and Signal Processing, 2008. ISCCSP 2008, 762–67. https://doi.org/10.1109/ISCCSP.2008.4537325.

———. 2008b. “On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.” IEEE Transactions on Information Theory 54 (11): 4813–20. https://doi.org/10.1109/TIT.2008.929920.

Buch, Michael, Elio Quinton, and Bob L Sturm. 2017. “NichtnegativeMatrixFaktorisierungnutzendesKlangsynthesenSystem (NiMFKS): Extensions of NMF-Based Concatenative Sound Synthesis.” In Proceedings of the 20th International Conference on Digital Audio Effects, 7. Edinburgh.

Caetano, Marcelo, and Xavier Rodet. 2013. “Musical Instrument Sound Morphing Guided by Perceptually Motivated Features.” IEEE Transactions on Audio, Speech, and Language Processing 21 (8, 8): 1666–75. https://doi.org/10.1109/TASL.2013.2260154.

Cao, Bin, Dou Shen, Jian-Tao Sun, Xuanhui Wang, Qiang Yang, and Zheng Chen. n.d. “Detect and Track Latent Factors with Online Nonnegative Matrix Factorization.” In. Accessed May 31, 2015. https://er2004.cse.ust.hk/~qyang/Docs/2007/ijcai07.Cao.pdf.

Carabias-Orti, J. J., T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Canadas-Quesada. 2011. “Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.” IEEE Journal of Selected Topics in Signal Processing 5 (6): 1144–58. https://doi.org/10.1109/JSTSP.2011.2159700.

Chi, Yuejie, Yue M. Lu, and Yuxin Chen. 2019. “Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview.” IEEE Transactions on Signal Processing 67 (20): 5239–69. https://doi.org/10.1109/TSP.2019.2937282.

Cichocki, A., N. Lee, I. V. Oseledets, A.-H. Phan, Q. Zhao, and D. Mandic. 2016. “Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1,” September. http://arxiv.org/abs/1609.00893.

Cichocki, A., R. Zdunek, and S. Amari. 2006. “New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation.” In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 5:V–V. https://doi.org/10.1109/ICASSP.2006.1661352.

Cohen, Albert, Ingrid Daubechies, and Jean-Christophe Feauveau. 1992. “Biorthogonal Bases of Compactly Supported Wavelets.” Communications on Pure and Applied Mathematics 45 (5): 485–560. http://www.math.duke.edu/~ingrid/publications/CPAM_1992_p485.pdf.

Combettes, Patrick L., and Jean-Christophe Pesquet. 2008. “A Proximal Decomposition Method for Solving Convex Variational.” Inverse Problems 24 (6): 065014. https://doi.org/10.1088/0266-5611/24/6/065014.

Dasarathy, Gautam, Parikshit Shah, Badri Narayan Bhaskar, and Robert Nowak. 2013. “Sketching Sparse Matrices,” March. http://arxiv.org/abs/1303.6544.

Dasgupta, Sanjoy, and Anupam Gupta. 2003. “An Elementary Proof of a Theorem of Johnson and Lindenstrauss.” Random Structures & Algorithms 22 (1): 60–65. https://doi.org/10.1002/rsa.10073.

Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. 2016. “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.” In Advances in Neural Information Processing Systems. http://arxiv.org/abs/1606.09375.

Desai, A., M. Ghashami, and J. M. Phillips. 2016. “Improved Practical Matrix Sketching with Guarantees.” IEEE Transactions on Knowledge and Data Engineering 28 (7): 1678–90. https://doi.org/10.1109/TKDE.2016.2539943.

Devarajan, Karthik. 2008. “Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology.” PLoS Comput Biol 4 (7): e1000029. https://doi.org/10.1371/journal.pcbi.1000029.

Ding, C., X. He, and H. Simon. 2005. “On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering.” In Proceedings of the 2005 SIAM International Conference on Data Mining, 606–10. Proceedings. Society for Industrial and Applied Mathematics. http://ranger.uta.edu/~chqding/papers/NMF-SDM2005.pdf.

Ding, C., Tao Li, and M. I. Jordan. 2010. “Convex and Semi-Nonnegative Matrix Factorizations.” IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (1): 45–55. https://doi.org/10.1109/TPAMI.2008.277.

Dokmanić, Ivan, and Rémi Gribonval. 2017. “Beyond Moore-Penrose Part II: The Sparse Pseudoinverse,” June. http://arxiv.org/abs/1706.08701.

Driedger, Jonathan, and Thomas Pratzlich. 2015. “Let It Bee – Towards NMF-Inspired Audio Mosaicing.” In Proceedings of ISMIR, 7. Malaga. http://ismir2015.uma.es/articles/13_Paper.pdf.

Drineas, Petros, and Michael W. Mahoney. 2005. “On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning.” Journal of Machine Learning Research 6 (December): 2153–75. http://jmlr.org/papers/volume6/drineas05a/drineas05a.pdf.

Dueck, Delbert, Quaid D. Morris, and Brendan J. Frey. 2005. “Multi-Way Clustering of Microarray Data Using Probabilistic Sparse Matrix Factorization.” Bioinformatics 21 (suppl 1): i144–i151. https://doi.org/10.1093/bioinformatics/bti1041.

Ellis, Robert L., and David C. Lay. 1992. “Factorization of Finite Rank Hankel and Toeplitz Matrices.” Linear Algebra and Its Applications 173 (August): 19–38. https://doi.org/10.1016/0024-3795(92)90420-F.

Fairbanks, James P., Ramakrishnan Kannan, Haesun Park, and David A. Bader. 2015. “Behavioral Clusters in Dynamic Graphs.” Parallel Computing, Graph analysis for scientific discovery, 47 (August): 38–50. https://doi.org/10.1016/j.parco.2015.03.002.

Févotte, Cédric, Nancy Bertin, and Jean-Louis Durrieu. 2008. “Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis.” Neural Computation 21 (3): 793–830. https://doi.org/10.1162/neco.2008.04-08-771.

Flammia, Steven T., David Gross, Yi-Kai Liu, and Jens Eisert. 2012. “Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators.” New Journal of Physics 14 (9): 095022. https://doi.org/10.1088/1367-2630/14/9/095022.

Fung, Wai Shing, Ramesh Hariharan, Nicholas J. A. Harvey, and Debmalya Panigrahi. 2011. “A General Framework for Graph Sparsification.” In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, 71–80. STOC ’11. New York, NY, USA: ACM. https://doi.org/10.1145/1993636.1993647.

Gemulla, Rainer, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011. “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent.” In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 69–77. KDD ’11. New York, NY, USA: ACM. https://doi.org/10.1145/2020408.2020426.

Ghashami, Mina, Edo Liberty, Jeff M. Phillips, and David P. Woodruff. 2015. “Frequent Directions : Simple and Deterministic Matrix Sketching,” January. http://arxiv.org/abs/1501.01711.

Gross, D. 2011. “Recovering Low-Rank Matrices from Few Coefficients in Any Basis.” IEEE Transactions on Information Theory 57 (3): 1548–66. https://doi.org/10.1109/TIT.2011.2104999.

Gross, David, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. 2010. “Quantum State Tomography via Compressed Sensing.” Physical Review Letters 105 (15). https://doi.org/10.1103/PhysRevLett.105.150401.

Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. “Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence. http://arxiv.org/abs/1210.4856.

Guan, Naiyang, Dacheng Tao, Zhigang Luo, and Bo Yuan. 2012. “NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization.” IEEE Transactions on Signal Processing 60 (6): 2882–98. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6166359.

Guan, N., D. Tao, Z. Luo, and B. Yuan. 2012. “Online Nonnegative Matrix Factorization with Robust Stochastic Approximation.” IEEE Transactions on Neural Networks and Learning Systems 23 (7): 1087–99. https://doi.org/10.1109/TNNLS.2012.2197827.

Hackbusch, Wolfgang. 2015. Hierarchical Matrices: Algorithms and Analysis. 1st ed. Springer Series in Computational Mathematics 49. Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated.

Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. 2009. “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” September. http://arxiv.org/abs/0909.4061.

Hassanieh, Haitham, Piotr Indyk, Dina Katabi, and Eric Price. 2012. “Nearly Optimal Sparse Fourier Transform.” In Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing, 563–78. STOC ’12. New York, NY, USA: ACM. https://doi.org/10.1145/2213977.2214029.

Hassanieh, H., P. Indyk, D. Katabi, and E. Price. 2012. “Simple and Practical Algorithm for Sparse Fourier Transform.” In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, 1183–94. Proceedings. Kyoto, Japan: Society for Industrial and Applied Mathematics. http://groups.csail.mit.edu/netmit/sFFT/soda_paper.pdf.

Heinig, Georg, and Karla Rost. 2011. “Fast Algorithms for Toeplitz and Hankel Matrices.” Linear Algebra and Its Applications 435 (1): 1–59. https://doi.org/10.1016/j.laa.2010.12.001.

Hoffman, Matthew, Francis R. Bach, and David M. Blei. 2010. “Online Learning for Latent Dirichlet Allocation.” In Advances in Neural Information Processing Systems, 856–64. http://papers.nips.cc/paper/3902-online-learning-for-latentdirichlet-allocation!

Hoffman, Matthew D, David M Blei, and Perry R Cook. 2010. “Bayesian Nonparametric Matrix Factorization for Recorded Music.” In International Conference on Machine Learning, 8. http://soundlab.cs.princeton.edu/publications/2010_icml_gapnmf.pdf.

Hoyer, P. O. 2002. “Non-Negative Sparse Coding.” In Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002, 557–65. https://doi.org/10.1109/NNSP.2002.1030067.

Hsieh, Cho-Jui, and Inderjit S. Dhillon. 2011. “Fast Coordinate Descent Methods with Variable Selection for Non-Negative Matrix Factorization.” In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1064–72. KDD ’11. New York, NY, USA: ACM. https://doi.org/10.1145/2020408.2020577.

Hu, Tao, Cengiz Pehlevan, and Dmitri B. Chklovskii. 2014. “A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers. https://doi.org/10.1109/ACSSC.2014.7094519.

Huang, G., M. Kaess, and J. J. Leonard. 2013. “Consistent Sparsification for Graph Optimization.” In 2013 European Conference on Mobile Robots (ECMR), 150–57. https://doi.org/10.1109/ECMR.2013.6698835.

Iliev, Filip L., Valentin G. Stanev, Velimir V. Vesselinov, and Boian S. Alexandrov. 2018. “Nonnegative Matrix Factorization for Identification of Unknown Number of Sources Emitting Delayed Signals.” PLOS ONE 13 (3): e0193974. https://doi.org/10.1371/journal.pone.0193974.

Kannan, Ramakrishnan. 2016. “Scalable and Distributed Constrained Low Rank Approximations,” April. https://smartech.gatech.edu/handle/1853/54962.

Kannan, Ramakrishnan, Grey Ballard, and Haesun Park. 2016. “A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization.” In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 9:1–9:11. PPoPP ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2851141.2851152.

Keriven, Nicolas, Anthony Bourrier, Rémi Gribonval, and Patrick Pérez. 2016. “Sketching for Large-Scale Learning of Mixture Models.” In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6190–4. https://doi.org/10.1109/ICASSP.2016.7472867.

Keshava, Nirmal. 2003. “A Survey of Spectral Unmixing Algorithms.” Lincoln Laboratory Journal 14 (1): 55–78. https://www.ll.mit.edu/publications/journal/pdf/vol14_no1/14_1survey.pdf.

Khoromskij, B. N., A. Litvinenko, and H. G. Matthies. 2009. “Application of Hierarchical Matrices for Computing the Karhunen–Loève Expansion.” Computing 84 (1-2): 49–67. https://doi.org/10.1007/s00607-008-0018-3.

Kim, H., and H. Park. 2008. “Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method.” SIAM Journal on Matrix Analysis and Applications 30 (2): 713–30. https://doi.org/10.1137/07069239X.

Koren, Yehuda, Robert Bell, and Chris Volinsky. 2009. “Matrix Factorization Techniques for Recommender Systems.” Computer 42 (8): 30–37. https://doi.org/10.1109/MC.2009.263.

Koutis, Ioannis, Gary L. Miller, and Richard Peng. 2012. “A Fast Solver for a Class of Linear Systems.” Communications of the ACM 55 (10): 99–107. https://doi.org/10.1145/2347736.2347759.

Kruskal, J. B. 1964. “Nonmetric Multidimensional Scaling: A Numerical Method.” Psychometrika 29 (2): 115–29. https://doi.org/10.1007/BF02289694.

Kumar, N. Kishore, and Jan Shneider. 2016. “Literature Survey on Low Rank Approximation of Matrices,” June. http://arxiv.org/abs/1606.06511.

Lahiri, Subhaneil, Peiran Gao, and Surya Ganguli. 2016. “Random Projections of Random Manifolds,” July. http://arxiv.org/abs/1607.04331.

Lawrence, Neil D., and Raquel Urtasun. 2009. “Non-Linear Matrix Factorization with Gaussian Processes.” In Proceedings of the 26th Annual International Conference on Machine Learning, 601–8. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553452.

Lee, Daniel D., and H. Sebastian Seung. 2001. “Algorithms for Non-Negative Matrix Factorization.” In Advances in Neural Information Processing Systems 13, edited by T. K. Leen, T. G. Dietterich, and V. Tresp, 556–62. MIT Press. http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf.

———. 1999. “Learning the Parts of Objects by Non-Negative Matrix Factorization.” Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.

Li, Chi-Kwong, and Edward Poon. 2002. “Additive Decomposition of Real Matrices.” Linear and Multilinear Algebra 50 (4): 321–26. https://doi.org/10.1080/03081080290025507.

Li, S. Z., XinWen Hou, HongJiang Zhang, and Qiansheng Cheng. 2001. “Learning Spatially Localized, Parts-Based Representation.” In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001, 1:I–207–I–212vol.1. https://doi.org/10.1109/CVPR.2001.990477.

Liberty, Edo. 2013. “Simple and Deterministic Matrix Sketching.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 581–88. KDD ’13. New York, NY, USA: ACM. https://doi.org/10.1145/2487575.2487623.

Liberty, Edo, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2007. “Randomized Algorithms for the Low-Rank Approximation of Matrices.” Proceedings of the National Academy of Sciences 104 (51): 20167–72. https://doi.org/10.1073/pnas.0709640104.

Lin, Chih-Jen. 2007. “Projected Gradient Methods for Nonnegative Matrix Factorization.” Neural Computation 19 (10): 2756–79. https://doi.org/10.1162/neco.2007.19.10.2756.

Lin, Zhouchen. n.d. “A Review on Low-Rank Models in Signal and Data Analysis.” Accessed March 29, 2016. http://www.cis.pku.edu.cn/faculty/vision/zlin/Publications/2016-BDIA-LR_review.pdf.

Liu, Tongliang, Dacheng Tao, and Dong Xu. 2016. “Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes,” January. http://arxiv.org/abs/1601.00238.

Liu, T., and D. Tao. 2015. “On the Performance of Manhattan Nonnegative Matrix Factorization.” IEEE Transactions on Neural Networks and Learning Systems PP (99): 1–1. https://doi.org/10.1109/TNNLS.2015.2458986.

López-Serrano, Patricio, Christian Dittmar, Yigitcan Özer, and Meinard Müller. 2019. “NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization.” In.

Mailhé, Boris, Rémi Gribonval, Pierre Vandergheynst, and Frédéric Bimbot. 2011. “Fast Orthogonal Sparse Approximation Algorithms over Local Dictionaries.” Signal Processing, Advances in Multirate Filter Bank Structures and Multiscale Representations, 91 (12): 2822–35. https://doi.org/10.1016/j.sigpro.2011.01.004.

Mairal, Julien, Francis Bach, and Jean Ponce. 2014. Sparse Modeling for Image and Vision Processing. Vol. 8. https://doi.org/10.1561/0600000058.

Mairal, Julien, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. “Online Dictionary Learning for Sparse Coding.” In Proceedings of the 26th Annual International Conference on Machine Learning, 689–96. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553463.

———. 2010. “Online Learning for Matrix Factorization and Sparse Coding.” The Journal of Machine Learning Research 11: 19–60. http://arxiv.org/abs/0908.0050.

Martinsson, Per-Gunnar. 2016. “Randomized Methods for Matrix Computations and Analysis of High Dimensional Data,” July. http://arxiv.org/abs/1607.01649.

Martinsson, Per-Gunnar, Vladimir Rockhlin, and Mark Tygert. 2006. “A Randomized Algorithm for the Approximation of Matrices.” DTIC Document. http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA458932.

Mensch, Arthur, Julien Mairal, Bertrand Thirion, and Gael Varoquaux. 2017. “Stochastic Subsampling for Factorizing Huge Matrices,” January. http://arxiv.org/abs/1701.05363.

Needell, Deanna, and Roman Vershynin. 2009. “Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit.” Foundations of Computational Mathematics 9 (3): 317–34. https://doi.org/10.1007/s10208-008-9031-3.

Nowak, W., and A. Litvinenko. 2013. “Kriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques.” Mathematical Geosciences 45 (4): 411–35. https://doi.org/10.1007/s11004-013-9453-6.

Oymak, Samet, and Joel A. Tropp. 2015. “Universality Laws for Randomized Dimension Reduction, with Applications,” November. http://arxiv.org/abs/1511.09433.

Paatero, Pentti, and Unto Tapper. 1994. “Positive Matrix Factorization: A Non-Negative Factor Model with Optimal Utilization of Error Estimates of Data Values.” Environmetrics 5 (2): 111–26. https://doi.org/10.1002/env.3170050203.

Pan, Gang, Wangsheng Zhang, Zhaohui Wu, and Shijian Li. 2014. “Online Community Detection for Large Complex Networks.” PLoS ONE 9 (7): e102799. https://doi.org/10.1371/journal.pone.0102799.

Rokhlin, Vladimir, Arthur Szlam, and Mark Tygert. 2009. “A Randomized Algorithm for Principal Component Analysis.” SIAM J. Matrix Anal. Appl. 31 (3): 1100–1124. https://doi.org/10.1137/080736417.

Rokhlin, Vladimir, and Mark Tygert. 2008. “A Fast Randomized Algorithm for Overdetermined Linear Least-Squares Regression.” Proceedings of the National Academy of Sciences 105 (36): 13212–7. https://doi.org/10.1073/pnas.0804869105.

Ryabko, Daniil, and Boris Ryabko. 2010. “Nonparametric Statistical Inference for Ergodic Processes.” IEEE Transactions on Information Theory 56 (3): 1430–5. https://doi.org/10.1109/TIT.2009.2039169.

Schmidt, M. N., J. Larsen, and Fu-Tien Hsiao. 2007. “Wind Noise Reduction Using Non-Negative Sparse Coding.” In 2007 IEEE Workshop on Machine Learning for Signal Processing, 431–36. https://doi.org/10.1109/MLSP.2007.4414345.

Seshadhri, C., Aneesh Sharma, Andrew Stolman, and Ashish Goel. 2020. “The Impossibility of Low-Rank Representations for Triangle-Rich Complex Networks.” Proceedings of the National Academy of Sciences 117 (11): 5631–7. https://doi.org/10.1073/pnas.1911030117.

Singh, Ajit P., and Geoffrey J. Gordon. 2008. “A Unified View of Matrix Factorization Models.” In Machine Learning and Knowledge Discovery in Databases, 358–73. Springer. https://www.select.cs.cmu.edu/publications/paperdir/ecml2008-singh-gordon.pdf.

Smaragdis, Paris. 2004. “Non-Negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs.” In Independent Component Analysis and Blind Signal Separation, edited by Carlos G. Puntonet and Alberto Prieto, 494–99. Lecture Notes in Computer Science. Granada, Spain: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-30110-3_63.

Soh, Yong Sheng, and Venkat Chandrasekaran. 2017. “A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers,” January. http://arxiv.org/abs/1701.01207.

Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014. “A Survey of Dimensionality Reduction Techniques,” March. http://arxiv.org/abs/1403.2877.

Spielman, Daniel A., and Shang-Hua Teng. 2004. “Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems.” In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, 81–90. STOC ’04. New York, NY, USA: ACM. https://doi.org/10.1145/1007352.1007372.

———. 2006. “Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems,” July. http://arxiv.org/abs/cs/0607105.

———. 2008a. “Spectral Sparsification of Graphs,” August. http://arxiv.org/abs/0808.4134.

———. 2008b. “A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly-Linear Time Graph Partitioning,” September. http://arxiv.org/abs/0809.3232.

Spielman, D., and N. Srivastava. 2011. “Graph Sparsification by Effective Resistances.” SIAM Journal on Computing 40 (6): 1913–26. https://doi.org/10.1137/080734029.

Sra, Suvrit, and Inderjit S. Dhillon. 2006. “Generalized Nonnegative Matrix Approximations with Bregman Divergences.” In Advances in Neural Information Processing Systems 18, edited by Y. Weiss, B. Schölkopf, and J. C. Platt, 283–90. MIT Press. http://papers.nips.cc/paper/2757-generalized-nonnegative-matrix-approximations-with-bregman-divergences.pdf.

Sun, Ying, and Michael L. Stein. 2016. “Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets.” Journal of Computational and Graphical Statistics 25 (1): 187–208. https://doi.org/10.1080/10618600.2014.975230.

Tropp, Joel A., Alp Yurtsever, Madeleine Udell, and Volkan Cevher. 2017. “Practical Sketching Algorithms for Low-Rank Matrix Approximation.” SIAM Journal on Matrix Analysis and Applications 38 (4): 1454–85. https://doi.org/10.1137/17M1111590.

———. 2016. “Randomized Single-View Algorithms for Low-Rank Matrix Approximation,” August. http://arxiv.org/abs/1609.00048.

Tufts, D. W., and R. Kumaresan. 1982. “Estimation of Frequencies of Multiple Sinusoids: Making Linear Prediction Perform Like Maximum Likelihood.” Proceedings of the IEEE 70 (9): 975–89. https://doi.org/10.1109/PROC.1982.12428.

Tung, Frederick, and James J. Little. n.d. “Factorized Binary Codes for Large-Scale Nearest Neighbor Search.” Accessed August 28, 2016. http://www.cs.ubc.ca/~ftung/papers/factorized_binary_codes_bmvc16.pdf.

Turner, Richard E., and Maneesh Sahani. 2014. “Time-Frequency Analysis as Probabilistic Inference.” IEEE Transactions on Signal Processing 62 (23): 6171–83. https://doi.org/10.1109/TSP.2014.2362100.

Türkmen, Ali Caner. 2015. “A Review of Nonnegative Matrix Factorization Methods for Clustering,” July. http://arxiv.org/abs/1507.03194.

Udell, M., and A. Townsend. 2019. “Why Are Big Data Matrices Approximately Low Rank?” SIAM Journal on Mathematics of Data Science 1 (1): 144–60. https://doi.org/10.1137/18M1183480.

Vaz, Colin, Asterios Toutios, and Shrikanth S. Narayanan. 2016. “Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data.” In, 963–67. https://doi.org/10.21437/Interspeech.2016-571.

Vincent, E., N. Bertin, and R. Badeau. 2008. “Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 109–12. https://doi.org/10.1109/ICASSP.2008.4517558.

Virtanen, T. 2007. “Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria.” IEEE Transactions on Audio, Speech, and Language Processing 15 (3): 1066–74. https://doi.org/10.1109/TASL.2006.885253.

Vishnoi, Nisheeth K. 2013. “Lx = B.” Foundations and Trends® in Theoretical Computer Science 8 (1-2): 1–141. https://doi.org/10.1561/0400000054.

Wager, S., L. Chen, M. Kim, and C. Raphael. 2017. “Towards Expressive Instrument Synthesis Through Smooth Frame-by-Frame Reconstruction: From String to Woodwind.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 391–95. https://doi.org/10.1109/ICASSP.2017.7952184.

Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017. “Locality Preserving Projections for Grassmann Manifold.” In PRoceedings of IJCAI, 2017. http://arxiv.org/abs/1704.08458.

Wang, Shusen, Alex Gittens, and Michael W. Mahoney. 2017. “Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging,” February. http://arxiv.org/abs/1702.04837.

Wang, Yuan, and Yunde Jia. 2004. “Fisher Non-Negative Matrix Factorization for Learning Local Features.” In In Proc. Asian Conf. On Comp. Vision, 27–30.

Wang, Y. X., and Y. J. Zhang. 2013. “Nonnegative Matrix Factorization: A Comprehensive Review.” IEEE Transactions on Knowledge and Data Engineering 25 (6): 1336–53. https://doi.org/10.1109/TKDE.2012.51.

Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019. “End-to-End Probabilistic Inference for Nonstationary Audio Analysis,” January. https://arxiv.org/abs/1901.11436v1.

Woodruff, David P. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science 1.0. Now Publishers. https://doi.org/10.1561/0400000060.

Woolfe, Franco, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. 2008. “A Fast Randomized Algorithm for the Approximation of Matrices.” Applied and Computational Harmonic Analysis 25 (3): 335–66. https://doi.org/10.1016/j.acha.2007.12.002.

Yang, Jiyan, Xiangrui Meng, and Michael W. Mahoney. 2015. “Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments,” February. http://arxiv.org/abs/1502.03032.

Yang, Wenzhuo, and Huan Xu. 2015. “Streaming Sparse Principal Component Analysis.” In Journal of Machine Learning Research, 494–503. http://jmlr.org/proceedings/papers/v37/yangd15.html.

Ye, Ke, and Lek-Heng Lim. 2016. “Every Matrix Is a Product of Toeplitz Matrices.” Foundations of Computational Mathematics 16 (3): 577–98. https://doi.org/10.1007/s10208-015-9254-z.

Yin, M., J. Gao, and Z. Lin. 2016. “Laplacian Regularized Low-Rank Representation and Its Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (3): 504–17. https://doi.org/10.1109/TPAMI.2015.2462360.

Yoshii, Kazuyoshi. 2013. “Beyond NMF: Time-Domain Audio Source Separation Without Phase Reconstruction,” 6.

Yu, Chenhan D., William B. March, and George Biros. 2017. “An $N \Log N$ Parallel Fast Direct Solver for Kernel Matrices.” In. http://arxiv.org/abs/1701.02324.

Yu, Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit S. Dhillon. 2012. “Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems.” In IEEE International Conference of Data Mining, 765–74. https://doi.org/10.1109/ICDM.2012.168.

———. 2014. “Parallel Matrix Factorization for Recommender Systems.” Knowledge and Information Systems 41 (3): 793–819. https://doi.org/10.1007/s10115-013-0682-2.

Zass, Ron, and Amnon Shashua. 2005. “A Unifying Approach to Hard and Probabilistic Clustering.” In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01, 294–301. ICCV ’05. Washington, DC, USA: IEEE Computer Society. https://doi.org/10.1109/ICCV.2005.27.

Zhang, Kai, Chuanren Liu, Jie Zhang, Hui Xiong, Eric Xing, and Jieping Ye. 2017. “Randomization or Condensation?: Linear-Cost Matrix Sketching via Cascaded Compression Sampling.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 615–23. KDD ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3097983.3098050.

Zhang, Xiao, Lingxiao Wang, and Quanquan Gu. 2017. “Stochastic Variance-Reduced Gradient Descent for Low-Rank Matrix Recovery from Linear Measurements,” January. http://arxiv.org/abs/1701.00481.

Zhang, Zhongyuan, Chris Ding, Tao Li, and Xiangsun Zhang. 2007. “Binary Matrix Factorization with Applications.” In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007, 391–400. IEEE. https://doi.org/10.1109/ICDM.2007.99.

Zhou, Tianyi, and Dacheng Tao. 2011. “Godec: Randomized Low-Rank & Sparse Matrix Decomposition in Noisy Case.” http://www.icml-2011.org/papers/41_icmlpaper.pdf.

———. 2012. “Multi-Label Subspace Ensemble.” Journal of Machine Learning Research. http://www.jmlr.org/proceedings/papers/v22/zhou12a/zhou12a.pdf.

Zitnik, Marinka, and Blaz Zupan. 2018. “NIMFA: A Python Library for Nonnegative Matrix Factorization,” August. http://arxiv.org/abs/1808.01743.