Sparse coding

Wavelets, matching pursuit, overcomplete dictionaries…

November 18, 2014 — March 7, 2022

high d
Hilbert space
linear algebra
signal processing
sparser than thou
Figure 1

Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus is on the noisy-observation case, although the same results are recycled enough throughout the field.

There are several senses in which people seem to use sparse coding; these do not necessarily mean the same thing, but they frequently connect.

I should break these two notions apart here. For now, I’m especially interested in adaptive bases.

This is merely a bunch of links to important articles at the moment; I might write a little exposition one day.

Decomposition of stuff by matching pursuit, wavelets, curvelets, chirplets, framelets, shearlets, camelhairbrushlets, content-specific basis dictionaries, designed or learned. Mammals visual cortexes seem to use something like this, if you squint right at the evidence.

To discuss:

Figure 2: Daniel LaCombe, some sparse basis functions

1 Resources

Baraniuk’s lab has a comprehensive, but not usefully annotated, selection of articles in this field, which I include less as a useful starting point, than to demonstrate the virtue of a literature review by showing the pathology of its absence.

2 Wavelet bases

Very popular practical intro is Torrence and Compo.


3 Scattering transform coefficients

See scattering transform.

4 Matching Pursuits

I do a lot of this now. I should document it. 🏗

5 Learnable codings

See Adaptive sparse coding.

6 Bayesian

Unsure. See perhaps Schniter, Potter, and Ziniel (2008),Zhou et al. (2009).

7 Incoming

Affine tight framelets (Ingrid Daubechies et al. 2003) and their presumably less-computationally-tractable, more flexible cousins, shearlets also sound interesting. For reasons I do not yet understand I am told they can naturally be used on sundry graphs and manifolds, not just lattices, is traditional in DSP. I saw Xiaosheng Zhuang present these (see, e.g. (Y. G. Wang and Zhuang 2016; B. Han, Zhao, and Zhuang 2016), where the latter demonstrates a Fast Framelet Transform which is supposedly as computationally as cheap as the FFT.)

I have some ideas I call learning gamelan which relate to this.

8 Implementations

Implementations boil down to clever optimisation and/or good use of functional transforms to make the calculations tractable.

  • Shailesh Kumar, Wavelet Transforms in Python with Google JAX introduces CR.Sparse, a JAX/XLA based library of accelerated models and algorithms for inverse problems in sparse representation and compressive sensing.

  • the standard wavelet toolkits.

    • scipy’s wavelet transform has no frills and little coherent explanation, but it goes
    • pywavelets does various fancy wavelets and seems to be a standard for python.
    • Matlab’s Wavelet toolbox seems to be the reference.
    • scikit-learn dictionary learning version here
    • also pydbm
    • Fancy easy GPU wavelet implementation, PyTorchWavelets.

    SParse Optimization Research COde (SPORCO) is an open-source Python package for solving optimization problems with sparsity-inducing regularization, consisting primarily of sparse coding and dictionary learning, for both standard and convolutional forms of sparse representation. In the current version, all optimization problems are solved within the Alternating Direction Method of Multipliers (ADMM) framework. SPORCO was developed for applications in signal and image processing, but is also expected to be useful for problems in computer vision, statistics, and machine learning.

  • Sparse-filtering: Unsupervised feature learning based on sparse-filtering

    This implements the method described Jiquan Ngiam, Pang Wei Koh, Zhenghao Chen, Sonia Bhaskar, Andrew Y. Ng: Sparse Filtering. NIPS 2011: 1125-1133 and is based on the Matlab code provided in the supplementary material

  • spams does a huge variety of off-the-shelf sparse codings, although none of them are flexible. Nonetheless it does some neat things fast.

SPAMS (SPArse Modeling Software) is an optimization toolbox for solving various sparse estimation problems.

  • Dictionary learning and matrix factorization (NMF, sparse PCA,…)
  • Solving sparse decomposition problems with LARS, coordinate descent, OMP, SOMP, proximal methods
  • Solving structured sparse decomposition problems (l1/l2, l1/linf, sparse group lasso, tree-structured regularization, structured sparsity with overlapping groups,…).

9 References

Aharon, Elad, and Bruckstein. 2006. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation.” IEEE Transactions on Signal Processing.
Arora, Ge, Ma, et al. 2015. Simple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of The 28th Conference on Learning Theory.
Bach, and Jordan. 2006. Learning Spectral Clustering, with Application to Speech Separation.” Journal of Machine Learning Research.
Baraniuk, Cevher, Duarte, et al. 2010. Model-Based Compressive Sensing.” IEEE Transactions on Information Theory.
Barron, Cohen, Dahmen, et al. 2008. Approximation and Learning by Greedy Algorithms.” The Annals of Statistics.
Barthélemy, Larue, Mayoue, et al. 2012. Shift & 2D Rotation Invariant Sparse Coding for Multivariate Signals.” IEEE Transactions on Signal Processing.
Bertin, Pennec, and Rivoirard. 2011. Adaptive Dantzig Density Estimation.” Annales de l’Institut Henri Poincaré, Probabilités Et Statistiques.
Beylkin, Coifman, and Rokhlin. 1991. Fast Wavelet Transforms and Numerical Algorithms I.” Communications on Pure and Applied Mathematics.
Blumensath, and Davies. 2004. On Shift-Invariant Sparse Coding.” In Independent Component Analysis and Blind Signal Separation.
———. 2006. Sparse and Shift-Invariant Representations of Music.” IEEE Transactions on Audio, Speech and Language Processing.
Bora, Jalal, Price, et al. 2017. Compressed Sensing Using Generative Models.” In International Conference on Machine Learning.
Boyes. 2011. Dictionary-Based Analysis/Synthesis and Structured Representations of Musical Audio.”
Bruna, and Mallat. 2013. Invariant Scattering Convolution Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
———. 2019. Multiscale Sparse Microcanonical Models.” arXiv:1801.02013 [Math-Ph, Stat].
Bruna, Mallat, Bacry, et al. 2015. Intermittent Process Analysis with Scattering Moments.” The Annals of Statistics.
Cai, Chan, and Shen. 2008. A Framelet-Based Image Inpainting Algorithm.” Applied and Computational Harmonic Analysis, Special Issue on Mathematical Imaging – Part II,.
Carabias-Orti, Virtanen, Vera-Candeas, et al. 2011. Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.” IEEE Journal of Selected Topics in Signal Processing.
Casazza, and Lynch. 2015. A Brief Introduction to Hilbert Space Frame Theory and Its Applications.” In Finite Frame Theory: A Complete Introduction to Overcompleteness.
Chen, and Donoho. 1994. Basis Pursuit.” In 1994 Conference Record of the Twenty-Eighth Asilomar Conference on Signals, Systems and Computers, 1994.
Cheng, and Ménard. 2021. How to Quantify Fields or Textures? A Guide to the Scattering Transform.”
Cox, and Rogers. 2021. Finding Distributed Needles in Neural Haystacks.” Journal of Neuroscience.
Daubechies, Ingrid. 1988. Orthonormal Bases of Compactly Supported Wavelets.” Communications on Pure and Applied Mathematics.
Daubechies, I., Defrise, and De Mol. 2004. An Iterative Thresholding Algorithm for Linear Inverse Problems with a Sparsity Constraint.” Communications on Pure and Applied Mathematics.
Daubechies, Ingrid, Han, Ron, et al. 2003. Framelets: MRA-Based Constructions of Wavelet Frames.” Applied and Computational Harmonic Analysis.
Davis, Geoffrey M. 1998. A Wavelet-Based Analysis of Fractal Image Compression.” IEEE Transactions on Image Processing.
Davis, G., Mallat, and Avellaneda. 1997. Adaptive Greedy Approximations.” Constructive Approximation.
Davis, Geoffrey M., Mallat, and Zhang. 1994a. Adaptive Time-Frequency Decompositions.” Optical Engineering.
———. 1994b. Adaptive Time-Frequency Decompositions with Matching Pursuit.” In Wavelet Applications.
DeVore. 1998. Nonlinear Approximation.” Acta Numerica.
Dong. 2015. Sparse Representation on Graphs by Tight Wavelet Frames and Applications.” Applied and Computational Harmonic Analysis.
Donoho, and Johnstone. 1995. Adapting to Unknown Smoothness via Wavelet Shrinkage.” Journal of the American Statistical Association.
Donoho, Johnstone, Kerkyacharian, et al. 1995. Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society. Series B (Methodological).
Du, Kibbe, and Lin. 2006. Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-Based Pattern Matching.” Bioinformatics.
Eggert, and Korner. 2004. Sparse Coding and NMF.” In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).
Ekanadham, Tranchina, and Simoncelli. 2011. Recovery of Sparse Translation-Invariant Signals With Continuous Basis Pursuit.” IEEE Transactions on Signal Processing.
Fan, and Li. 2001. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association.
Frazier. 1999. An Introduction to Wavelets Through Linear Algebra.
Garg, Rish, Cecchi, et al. 2017. Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World.” In arXiv:1701.06106 [Cs, Stat].
Gersho, and Gray. 2012. Vector Quantization and Signal Compression.
Giné, and Nickl. 2009. Uniform Limit Theorems for Wavelet Density Estimators.” The Annals of Probability.
Giryes, Sapiro, and Bronstein. 2016. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? IEEE Transactions on Signal Processing.
Goodwin, M M. 2001. Multiscale Overlap-Add Sinusoidal Modeling Using Matching Pursuit and Refinements.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
Goodwin, M., and Vetterli. 1997. Atomic Decompositions of Audio Signals.” In 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1997.
Goodwin, M M, and Vetterli. 1999. Matching Pursuit and Atomic Signal Models Based on Recursive Filter Banks.” IEEE Transactions on Signal Processing.
Gray. 1984. Vector Quantization.” IEEE ASSP Magazine.
Gregor, and LeCun. 2010. Learning fast approximations of sparse coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10).
———. 2011. Efficient Learning of Sparse Invariant Representations.” arXiv:1105.5307 [Cs].
Grosse, Raina, Kwong, et al. 2007. Shift-Invariant Sparse Coding for Audio Classification.” In The Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007).
Gupta, and Pensky. 2016. Solution of Linear Ill-Posed Problems Using Random Dictionaries.” arXiv:1605.07913 [Math, Stat].
Hahn, Lewkowitz, Lacombe, et al. 2015. Deep Learning Human Actions from Video via Sparse Filtering and Locally Competitive Algorithms.” Multimedia Tools and Applications.
Han, Kyunghee, and Shin. n.d. Functional Linear Regression for Functional Response via Sparse Basis Selection.”
Han, Bin, Zhao, and Zhuang. 2016. Directional Tensor Product Complex Tight Framelets with Low Redundancy.” Applied and Computational Harmonic Analysis, Sparse Representations with Applications in Imaging Science, Data Analysis, and Beyond, Part IISI: ICCHAS Outgrowth, part 2,.
Han, Bin, and Zhuang. 2015. Smooth Affine Shear Tight Frames with MRA Structure.” Applied and Computational Harmonic Analysis.
Harte, Sandler, and Gasser. 2006. Detecting Harmonic Change in Musical Audio.” In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia. AMCMM ’06.
Henaff, Jarrett, Kavukcuoglu, et al. 2011. Unsupervised Learning of Sparse Features for Scalable Audio Classification. In ISMIR.
Hoyer, Patrik O. n.d. Non-Negative Matrix Factorization with Sparseness Constraints.” Journal of Machine Learning Research.
Hoyer, P.O. 2002. Non-Negative Sparse Coding.” In Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002.
Huang, Cheang, and Barron. 2008. Risk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”
Hyvärinen, and Hoyer. 2000. Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces.” Neural Computation.
Hyvärinen, Hurri, and Hoyer. 2009. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision.
Jafari, and Plumbley. 2011. Fast Dictionary Learning for Sparse Representations of Speech Signals.” IEEE Journal of Selected Topics in Signal Processing.
Jaillet, Gribonval, Plumbley, et al. 2010. An L1 Criterion for Dictionary Learning by Subspace Identification.” In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
Jung. 2013. An RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29.
Kim, and Park. 2008. Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method.” SIAM Journal on Matrix Analysis and Applications.
Koch, and Corso. 2016. Sparse Factorization Layers for Neural Networks with Limited Supervision.” arXiv:1612.04468 [Cs, Stat].
Koppel, Warnell, Stump, et al. 2016. Parsimonious Online Learning with Kernels via Sparse Projections in Function Space.” arXiv:1612.04111 [Cs, Stat].
Lattner, Dorfler, and Arzt. 2019. Learning Complex Basis Functions for Invariant Representations of Audio.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.
Lee, Wee Sun, Bartlett, and Williamson. 1996. Efficient Agnostic Learning of Neural Networks with Bounded Fan-in.” IEEE Transactions on Information Theory.
Lee, Honglak, Battle, Raina, et al. 2007. Efficient Sparse Coding Algorithms.” Advances in Neural Information Processing Systems.
Lewicki, M S, and Sejnowski. 1999. Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations.” In NIPS.
Lewicki, Michael S., and Sejnowski. 2000. Learning Overcomplete Representations.” Neural Computation.
Liu, T., and Tao. 2015. On the Performance of Manhattan Nonnegative Matrix Factorization.” IEEE Transactions on Neural Networks and Learning Systems.
Liu, Tongliang, Tao, and Xu. 2016. Dimensionality-Dependent Generalization Bounds for \(k\)-Dimensional Coding Schemes.” arXiv:1601.00238 [Cs, Stat].
Lyu, Needell, and Balzano. 2020. Online Matrix Factorization for Markovian Data and Applications to Network Dictionary Learning.” Journal of Machine Learning Research.
Mailhé, Gribonval, Vandergheynst, et al. 2011. Fast Orthogonal Sparse Approximation Algorithms over Local Dictionaries.” Signal Processing, Advances in Multirate Filter Bank Structures and Multiscale Representations,.
Mairal, Bach, Ponce, et al. 2009. Online Dictionary Learning for Sparse Coding.” In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.
———, et al. 2010. Online Learning for Matrix Factorization and Sparse Coding.” The Journal of Machine Learning Research.
Mairal, Bach, and Ponce. 2014. Sparse Modeling for Image and Vision Processing.
Mallat, Stephane G. 1989. Multiresolution Approximations and Wavelet Orthonormal Bases of L²(R).” Transactions of the American Mathematical Society.
Mallat, Stéphane. 2008. A Wavelet Tour of Signal Processing: The Sparse Way.
———. 2012. Group Invariant Scattering.” Communications on Pure and Applied Mathematics.
Mallat, S., and Zhang. 1992. Adaptive Time-Frequency Decomposition with Matching Pursuits.” In Time-Frequency and Time-Scale Analysis, 1992., Proceedings of the IEEE-SP International Symposium.
Mallat, Stéphane G., and Zhang. 1993. Matching Pursuits with Time-Frequency Dictionaries.” IEEE Transactions on Signal Processing.
Marcus, Marblestone, and Dean. 2014. The atoms of neural computation.” Science.
Mlynarski. 2013. Sparse, Complex-Valued Representations of Natural Sounds Learned with Phase and Amplitude Continuity Priors.” arXiv Preprint arXiv:1312.4695.
Mondal, and Percival. 2010. M-Estimation of Wavelet Variance.” Annals of the Institute of Statistical Mathematics.
Mørup, Schmidt, and Hansen. 2007. Shift Invariant Sparse Coding of Image and Music Data.” Journal of Machine Learning Research.
Ngiam, Chen, Bhaskar, et al. 2011. Sparse Filtering.” In Advances in Neural Information Processing Systems 24.
Olshausen, B. A., and Field. 1996. Natural image statistics and efficient coding.” Network (Bristol, England).
Olshausen, Bruno A., and Field. 1996. Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images.” Nature.
Olshausen, Bruno A, and Field. 2004. Sparse Coding of Sensory Inputs.” Current Opinion in Neurobiology.
Opsomer, Wang, and Yang. 2001. Nonparametric Regression with Correlated Errors.” Statistical Science.
Oyallon, Belilovsky, and Zagoruyko. 2017. Scaling the Scattering Transform: Deep Hybrid Networks.” arXiv Preprint arXiv:1703.08961.
Pfister, and Bresler. 2017. Automatic Parameter Tuning for Image Denoising with Learned Sparsifying Transforms.” In.
Plumbley, Abdallah, Blumensath, et al. 2006. Sparse Representations of Polyphonic Music.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing,.
Qian, and Chen. 1994. Signal Representation Using Adaptive Normalized Gaussian Functions.” Signal Processing.
Ravishankar, and Bresler. 2015. Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI.” arXiv:1501.02923 [Cs, Stat].
Rubinstein, Bruckstein, and Elad. 2010. Dictionaries for Sparse Representation Modeling.” Proceedings of the IEEE.
Rubinstein, Zibulevsky, and Elad. 2008. Efficient Implementation of the K-SVD Algorithm Using Batch Orthogonal Matching Pursuit.”
Saad. 2003. Iterative Methods for Sparse Linear Systems: Second Edition.
Schniter, Potter, and Ziniel. 2008. Fast Bayesian Matching Pursuit.” In 2008 Information Theory and Applications Workshop.
Shen. 2010. Wavelet Frames and Image Restorations.” In Scopus.
Simoncelli, and Olshausen. 2001. Natural Image Statistics and Neural Representation.” Annual Review of Neuroscience.
Smith, and Lewicki. 2006. Efficient Auditory Coding.” Nature.
Soh, and Chandrasekaran. 2017. A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.” arXiv:1701.01207 [Cs, Math, Stat].
Sprechmann, Bruna, and LeCun. 2014. Audio Source Separation with Discriminative Scattering Networks.” arXiv:1412.7022 [Cs].
Sulam, Aberdam, Beck, et al. 2020. On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Tang. 2000. Wavelet Theory and Its Application to Pattern Recognition.
Torrence, and Compo. 1998. A Practical Guide to Wavelet Analysis.” Bulletin of the American Meteorological Society.
Tošić, and Frossard. 2011. Dictionary Learning: What Is the Right Representation for My Signal? IEEE Signal Processing Magazine.
Tropp, Joel A. 2006. Algorithms for Simultaneous Sparse Approximation. Part II: Convex Relaxation.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing,.
Tropp, J. A., and Wright. 2010. Computational Methods for Sparse Solution of Linear Inverse Problems.” Proceedings of the IEEE.
Tsaig, and Donoho. 2006a. Breakdown of Equivalence Between the Minimal -Norm Solution and the Sparsest Solution.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing,.
———. 2006b. Extensions of Compressed Sensing.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing,.
Türkmen. 2015. A Review of Nonnegative Matrix Factorization Methods for Clustering.” arXiv:1507.03194 [Cs, Stat].
Vainsencher, Mannor, and Bruckstein. 2011. The Sample Complexity of Dictionary Learning.” Journal of Machine Learning Research.
Vetterli. 1999. Wavelets: Approximation and Compression–a Review.” In AeroSense’99.
Wang, Yu-Xiang, Smola, and Tibshirani. 2014. The Falling Factorial Basis and Its Statistical Applications.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. ICML’14.
Wang, Yu Guang, and Zhu. 2017. Localized Tight Frames and Fast Framelet Transforms on the Simplex.” arXiv:1701.01595 [Cs, Math].
Wang, Yu Guang, and Zhuang. 2016. Tight Framelets and Fast Framelet Transforms on Manifolds.” arXiv:1608.04026 [Math].
Weidmann, and Vetterli. 2012. Rate Distortion Behavior of Sparse Sources.” IEEE Transactions on Information Theory.
Wohlberg. 2017. “SPORCO: A Python Package for Standard and Convolutional Sparse Representations.” In.
Wright, and Ma. 2022. High-dimensional data analysis with low-dimensional models: Principles, computation, and applications.
Yaghoobi, Daudet, and Davies. 2009. Parametric Dictionary Design for Sparse Coding.” IEEE Transactions on Signal Processing.
Yaghoobi, Nam, Gribonval, et al. 2013. Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling.” IEEE Transactions on Signal Processing.
Yuan, Li, and Zhang. 2014. Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32.
Zhang, and Wang. 2022. Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?
Zhou, Chen, Paisley, et al. 2009. Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations.” In Proceedings of the 22nd International Conference on Neural Information Processing Systems. NIPS’09.
Zhuang. 2016. Digital Affine Shear Transforms: Fast Realization and Applications in Image/Video Processing.” SIAM Journal on Imaging Sciences.