Sparse coding

How to make big things out of short lists of small things.

Daniel LaCombe:

some sparse basis functions

Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisy-observation case, although the same results are recycled enough throughout the field.

Note that there are two ways you can get your representation to be sparse;

  • you know that your signal happens to be compressible, in the sense that under some transform its coefficient vector is mostly zeros, even in a plain old orthogonal basis expansion.

  • you are using a redundant dictionary such that you won’t need most of it to represent even a dense signal.

I should break these two notions apart here. For now, I’m especially interested in adaptive bases.

This is merely a bunch of links to important articles at the moment; I should do a little exposition one day.

Decomposition of stuff by matching pursuit, wavelets, curvelets, chirplets, framelets, shearlets, camelhairbrushlets, content-specific basis dictionaries, designed or learned. Mammals visual cortexes seem to use something like this, if you squint right at the evidence.

To discuss:

  • connection to mixture models.
  • Sampling complexity versus approximation complexity
  • am especially interested in approaches where we learn the transform or the basis dictionary unsupervised

Resources

Baraniuk’s lab has a comprehensive, but not usefully annotated, selection of articles in this field, which I include more to demonstrate the virtue of a literature review by showing the pathology of its absence, rather than as a useful starting point.

Wavelet bases

Very popular practical intro is Torrence and Compo.

🏗

Matching Pursuits

I do a lot of this now. I should document it. 🏗

Learnable codings

Adaptive dictionaries!

I want to generalise or extend this idea, ideally in some shift-invariant way (see below.)

(Bruno A. Olshausen and Field 1996) kicked this area off by arguing sparse coding tricks are revealing of what the brain does.

For a walk through of one version of this, see Theano example of dictionary learning by Daniel LaCombe, who bases his version on (Ngiam et al. 2011; Hyvärinen, Hurri, and Hoyer 2009; Hahn et al. 2015).

See (Mairal, Bach, and Ponce 2014) for some a summary of methods to 2009 in basis learning.

Question: how do you do this in a big data / offline setting?

TRANSFORM LEARNING: Sparse Representations at Scale.

We have proposed several methods for batch learning of square or overcomplete sparsifying transforms from data. We have also investigated specific structures for these transforms such as double sparsity, union-of-transforms, and filter bank structures, which enable their efficient learning or usage. Apart from batch transform learning, our group has investigated methods for online learning of sparsifying transforms, which are particularly useful for big data or real-time applications.

Huh.

Codings with desired invariances

I would like to find bases robust against certain transformations, especially phase/shift-robust codings, although doing this naively can be computationally expensive outside of certain convenient bases. (Sorry, that’s not very clear; I need to return to this section to polish it up. 🏗)

One method is “Shift Invariant Sparse coding”, (Blumensath and Davies 2004) and there are various extensions and approximations out there. (Grosse et al. (2007) etc) One way is to include multiple shifted copies of your atoms, another is to actually shift them in a separate optimisation stage. Both these get annoying in the time domain for various reasons. (Lattner, Dorfler, and Arzt 2019) presents an adaptive sparse coding method preserving desired invariants.

Misc

Affine tight framelets ((Daubechies et al. 2003)) and their presumably less-computationally-tractable, more flexible cousins, shearlets also sound interesting here. For reasons I do not yet understand I am told they can naturally be used on sundry graphs and manifolds, not just lattices, is traditional in DSP. I saw Xiaosheng Zhuang present these (see, e.g. (Wang and Zhuang 2016; Han, Zhao, and Zhuang 2016), where the latter demonstrates a Fast Framelet Transform which is supposedly as computationally as cheap as the FFT.)

I have some ideas I call learning gamelan which relate to this.

Implementations

This boils down to clever optimisation to make the calculations tractable.

  • the wavelet toolkits.

    • scipy’s wavelet transform has no frills and little coherent explanation, but it goes
    • pywavelets does various fancy wavelets and seems to be a standard for python.
    • Matlab’s Wavelet toolbox seems to be the reference.
    • scikit-learn dictionary learning version here
    • also pydbm
    • Fancy easy GPU wavelet implementation, PyTorchWavelets.
  • SPORCO

    SParse Optimization Research COde (SPORCO) is an open-source Python package for solving optimization problems with sparsity-inducing regularization, consisting primarily of sparse coding and dictionary learning, for both standard and convolutional forms of sparse representation. In the current version, all optimization problems are solved within the Alternating Direction Method of Multipliers (ADMM) framework. SPORCO was developed for applications in signal and image processing, but is also expected to be useful for problems in computer vision, statistics, and machine learning.

  • Sparse-filtering: Unsupervised feature learning based on sparse-filtering

    This implements the method described Jiquan Ngiam, Pang Wei Koh, Zhenghao Chen, Sonia Bhaskar, Andrew Y. Ng: Sparse Filtering. NIPS 2011: 1125-1133 and is based on the Matlab code provided in the supplementary material

  • spams does a variety of sparse codings, although non of them accepting pluggable models. Nonetheless it does some neat things fast. (see optimisation)

Aharon, M., M. Elad, and A. Bruckstein. 2006. “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation.” IEEE Transactions on Signal Processing 54 (11): 4311–22. https://doi.org/10.1109/TSP.2006.881199.

Arora, Sanjeev, Rong Ge, Tengyu Ma, and Ankur Moitra. 2015. “Simple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of the 28th Conference on Learning Theory, 40:113–49. Paris, France: PMLR. http://proceedings.mlr.press/v40/Arora15.html.

Bach, Francis R., and Michael I. Jordan. 2006. “Learning Spectral Clustering, with Application to Speech Separation.” Journal of Machine Learning Research 7 (Oct): 1963–2001. http://www.jmlr.org/papers/v7/bach06b.html.

Baraniuk, Richard G., Volkan Cevher, Marco F. Duarte, and Chinmay Hegde. 2010. “Model-Based Compressive Sensing.” IEEE Transactions on Information Theory 56 (4): 1982–2001. https://doi.org/10.1109/TIT.2010.2040894.

Barron, Andrew R., Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore. 2008. “Approximation and Learning by Greedy Algorithms.” The Annals of Statistics 36 (1): 64–94. https://doi.org/10.1214/009053607000000631.

Barthélemy, Quentin, Anthony Larue, Aurélien Mayoue, David Mercier, and Jérôme I. Mars. 2012. “Shift & 2D Rotation Invariant Sparse Coding for Multivariate Signals.” IEEE Transactions on Signal Processing 60 (4): 1597–1611. https://hal.archives-ouvertes.fr/hal-00678446/document.

Bertin, K., E. Le Pennec, and V. Rivoirard. 2011. “Adaptive Dantzig Density Estimation.” Annales de L’Institut Henri Poincaré, Probabilités et Statistiques 47 (1): 43–74. https://doi.org/10.1214/09-AIHP351.

Blumensath, Thomas, and Mike Davies. 2004. “On Shift-Invariant Sparse Coding.” In Independent Component Analysis and Blind Signal Separation, edited by Carlos G. Puntonet and Alberto Prieto, 3195:1205–12. Berlin, Heidelberg: Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/978-3-540-30110-3_152.

———. 2006. “Sparse and Shift-Invariant Representations of Music.” IEEE Transactions on Audio, Speech and Language Processing 14 (1): 50–57. https://doi.org/10.1109/TSA.2005.860346.

Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. “Compressed Sensing Using Generative Models.” In International Conference on Machine Learning, 537–46. http://arxiv.org/abs/1703.03208.

Boyes, Graham. 2011. “Dictionary-Based Analysis/Synthesis and Structured Representations of Musical Audio.” McGill University. http://mt.music.mcgill.ca/~boyesg/GBoyes_MAthesis-Final.pdf.

Cai, Jian-Feng, Raymond H. Chan, and Zuowei Shen. 2008. “A Framelet-Based Image Inpainting Algorithm.” Applied and Computational Harmonic Analysis, Special Issue on Mathematical Imaging – Part II, 24 (2): 131–49. https://doi.org/10.1016/j.acha.2007.10.002.

Carabias-Orti, J. J., T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Canadas-Quesada. 2011. “Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.” IEEE Journal of Selected Topics in Signal Processing 5 (6): 1144–58. https://doi.org/10.1109/JSTSP.2011.2159700.

Casazza, Peter G., and Richard G. Lynch. 2015. “A Brief Introduction to Hilbert Space Frame Theory and Its Applications.” In Finite Frame Theory: A Complete Introduction to Overcompleteness. http://arxiv.org/abs/1509.07347.

Chen, Shaobing, and David L. Donoho. 1994. “Basis Pursuit.” In 1994 Conference Record of the Twenty-Eighth Asilomar Conference on Signals, Systems and Computers, 1994, 1:41–44 vol.1. https://doi.org/10.1109/ACSSC.1994.471413.

Daubechies, I., M. Defrise, and C. De Mol. 2004. “An Iterative Thresholding Algorithm for Linear Inverse Problems with a Sparsity Constraint.” Communications on Pure and Applied Mathematics 57 (11): 1413–57. https://doi.org/10.1002/cpa.20042.

Daubechies, Ingrid. 1988. “Orthonormal Bases of Compactly Supported Wavelets.” Communications on Pure and Applied Mathematics 41 (7): 909–96. https://doi.org/10.1002/cpa.3160410705.

Daubechies, Ingrid, Bin Han, Amos Ron, and Zuowei Shen. 2003. “Framelets: MRA-Based Constructions of Wavelet Frames.” Applied and Computational Harmonic Analysis 14 (1): 1–46. https://doi.org/10.1016/S1063-5203(02)00511-0.

Davis, Geoffrey M. 1998. “A Wavelet-Based Analysis of Fractal Image Compression.” IEEE Transactions on Image Processing 7 (2): 141–54. https://doi.org/10.1109/83.660992.

Davis, Geoffrey M., Stephane G. Mallat, and Zhifeng Zhang. 1994a. “Adaptive Time-Frequency Decompositions.” Optical Engineering 33 (7): 2183–91. https://doi.org/10.1117/12.173207.

———. 1994b. “Adaptive Time-Frequency Decompositions with Matching Pursuit.” In Wavelet Applications, 2242:402–14. International Society for Optics and Photonics. https://doi.org/10.1117/12.170041.

Davis, G., S. Mallat, and M. Avellaneda. 1997. “Adaptive Greedy Approximations.” Constructive Approximation 13 (1): 57–98. https://doi.org/10.1007/BF02678430.

DeVore, Ronald A. 1998. “Nonlinear Approximation.” Acta Numerica 7 (January): 51–150. https://doi.org/10.1017/S0962492900002816.

Dong, Bin. 2015. “Sparse Representation on Graphs by Tight Wavelet Frames and Applications.” Applied and Computational Harmonic Analysis. https://doi.org/10.1016/j.acha.2015.09.005.

Donoho, David L., and Iain M. Johnstone. 1995. “Adapting to Unknown Smoothness via Wavelet Shrinkage.” Journal of the American Statistical Association 90 (432): 1200–1224. https://doi.org/10.1080/01621459.1995.10476626.

Donoho, David L., Iain M. Johnstone, Gerard Kerkyacharian, and Dominique Picard. 1995. “Wavelet Shrinkage: Asymptopia?” Journal of the Royal Statistical Society. Series B (Methodological) 57 (2): 301–69. http://statweb.stanford.edu/~imj/WEBLIST/1995/asymp.pdf.

Du, Pan, Warren A. Kibbe, and Simon M. Lin. 2006. “Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-Based Pattern Matching.” Bioinformatics 22 (17): 2059–65. https://doi.org/10.1093/bioinformatics/btl355.

Eggert, J., and E. Korner. 2004. “Sparse Coding and NMF.” In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), 4:2529–33 vol.4. https://doi.org/10.1109/IJCNN.2004.1381036.

Ekanadham, C., D. Tranchina, and E. P. Simoncelli. 2011. “Recovery of Sparse Translation-Invariant Signals with Continuous Basis Pursuit.” IEEE Transactions on Signal Processing 59 (10): 4735–44. https://doi.org/10.1109/TSP.2011.2160058.

Fan, Jianqing, and Runze Li. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60. https://doi.org/10.1198/016214501753382273.

Garg, Sahil, Irina Rish, Guillermo Cecchi, and Aurelie Lozano. 2017. “Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World.” In. http://arxiv.org/abs/1701.06106.

Gersho, Allen, and Robert M. Gray. 2012. Vector Quantization and Signal Compression. Springer Science & Business Media.

Giné, Evarist, and Richard Nickl. 2009. “Uniform Limit Theorems for Wavelet Density Estimators.” The Annals of Probability 37 (4): 1605–46. https://doi.org/10.1214/08-AOP447.

Giryes, R., G. Sapiro, and A. M. Bronstein. 2016. “Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?” IEEE Transactions on Signal Processing 64 (13): 3444–57. https://doi.org/10.1109/TSP.2016.2546221.

Goodwin, M M. 2001. “Multiscale Overlap-Add Sinusoidal Modeling Using Matching Pursuit and Refinements.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. https://www.researchgate.net/profile/Michael_Goodwin4/publication/3927312_Multiscale_overlap-add_sinusoidal_modeling_using_matching_pursuitand_refinements/links/543416d90cf2bf1f1f27b6c4.pdf.

Goodwin, M M, and M Vetterli. 1999. “Matching Pursuit and Atomic Signal Models Based on Recursive Filter Banks.” IEEE Transactions on Signal Processing 47 (7): 1890–1902. https://doi.org/10.1109/78.771038.

Goodwin, M., and M. Vetterli. 1997. “Atomic Decompositions of Audio Signals.” In 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1997. https://doi.org/10.1109/ASPAA.1997.625601.

Gray, R. 1984. “Vector Quantization.” IEEE ASSP Magazine 1 (2): 4–29. https://doi.org/10.1109/MASSP.1984.1162229.

Gregor, Karol, and Yann LeCun. 2010. “Learning Fast Approximations of Sparse Coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 399–406. http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_GregorL10.pdf.

———. 2011. “Efficient Learning of Sparse Invariant Representations,” May. http://arxiv.org/abs/1105.5307.

Grosse, Roger, Rajat Raina, Helen Kwong, and Andrew Y. Ng. 2007. “Shift-Invariant Sparse Coding for Audio Classification.” In The Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007), 9:8. http://arxiv.org/abs/1206.5241.

Gupta, Pawan, and Marianna Pensky. 2016. “Solution of Linear Ill-Posed Problems Using Random Dictionaries,” May. http://arxiv.org/abs/1605.07913.

Hahn, William Edward, Stephanie Lewkowitz, Daniel C. Lacombe, and Elan Barenholtz. 2015. “Deep Learning Human Actions from Video via Sparse Filtering and Locally Competitive Algorithms.” Multimedia Tools and Applications 74 (22): 10097–10110. https://doi.org/10.1007/s11042-015-2808-x.

Han, Bin, Zhenpeng Zhao, and Xiaosheng Zhuang. 2016. “Directional Tensor Product Complex Tight Framelets with Low Redundancy.” Applied and Computational Harmonic Analysis, Sparse Representations with Applications in Imaging Science, Data Analysis, and Beyond, Part IISI: ICCHAS Outgrowth, part 2, 41 (2): 603–37. https://doi.org/10.1016/j.acha.2015.07.003.

Han, Bin, and Xiaosheng Zhuang. 2015. “Smooth Affine Shear Tight Frames with MRA Structure.” Applied and Computational Harmonic Analysis 39 (2): 300–338. https://doi.org/10.1016/j.acha.2014.09.005.

Han, Kyunghee, and Hyejin Shin. n.d. “Functional Linear Regression for Functional Response via Sparse Basis Selection.” Accessed November 17, 2014. http://www.statistics.gov.hk/wsc/IPS058-P6-S.pdf.

Harte, Christopher, Mark Sandler, and Martin Gasser. 2006. “Detecting Harmonic Change in Musical Audio.” In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, 21–26. AMCMM ’06. New York, NY, USA: ACM. https://doi.org/10.1145/1178723.1178727.

Henaff, Mikael, Kevin Jarrett, Koray Kavukcuoglu, and Yann LeCun. 2011. “Unsupervised Learning of Sparse Features for Scalable Audio Classification.” In ISMIR. http://ismir2011.ismir.net/papers/PS6-5.pdf.

Hoyer, Patrik O. n.d. “Non-Negative Matrix Factorization with Sparseness Constraints.” Journal of Machine Learning Research 5 (9): 1457–69. Accessed October 10, 2014. http://arxiv.org/abs/cs/0408058.

Hoyer, P. O. 2002. “Non-Negative Sparse Coding.” In Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002, 557–65. https://doi.org/10.1109/NNSP.2002.1030067.

Huang, Cong, G. L. H. Cheang, and Andrew R. Barron. 2008. “Risk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.” http://www.stat.yale.edu/~arb4/publications_files/RiskGreedySelectionAndL1penalization.pdf.

Hyvärinen, Aapo, and Patrik Hoyer. 2000. “Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces.” Neural Computation 12 (7): 1705–20. https://doi.org/10.1162/089976600300015312.

Hyvärinen, Aapo, Jarmo Hurri, and Patrick O. Hoyer. 2009. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Vol. 39. Springer Science & Business Media. http://www.academia.edu/download/640016/gt1uh8u6fhk4474.pdf.

Jafari, M. G., and M. D. Plumbley. 2011. “Fast Dictionary Learning for Sparse Representations of Speech Signals.” IEEE Journal of Selected Topics in Signal Processing 5 (5): 1025–31. https://doi.org/10.1109/JSTSP.2011.2157892.

Jaillet, F., R. Gribonval, M. D. Plumbley, and H. Zayyani. 2010. “An L1 Criterion for Dictionary Learning by Subspace Identification.” In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 5482–5. https://doi.org/10.1109/ICASSP.2010.5495206.

Jung, Alexander. 2013. “An RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1311.5768.

Kim, H., and H. Park. 2008. “Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method.” SIAM Journal on Matrix Analysis and Applications 30 (2): 713–30. https://doi.org/10.1137/07069239X.

Koch, Parker, and Jason J. Corso. 2016. “Sparse Factorization Layers for Neural Networks with Limited Supervision,” December. http://arxiv.org/abs/1612.04468.

Koppel, Alec, Garrett Warnell, Ethan Stump, and Alejandro Ribeiro. 2016. “Parsimonious Online Learning with Kernels via Sparse Projections in Function Space,” December. http://arxiv.org/abs/1612.04111.

Lattner, Stefan, Monika Dorfler, and Andreas Arzt. 2019. “Learning Complex Basis Functions for Invariant Representations of Audio.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval, 8. http://archives.ismir.net/ismir2019/paper/000085.pdf.

Lee, Honglak, Alexis Battle, Rajat Raina, and Andrew Y. Ng. 2007. “Efficient Sparse Coding Algorithms.” Advances in Neural Information Processing Systems 19: 801. https://papers.nips.cc/paper/2979-efficient-sparse-coding-algorithms.pdf.

Lee, Wee Sun, Peter L. Bartlett, and Robert C. Williamson. 1996. “Efficient Agnostic Learning of Neural Networks with Bounded Fan-in.” IEEE Transactions on Information Theory 42 (6): 2118–32. https://doi.org/10.1109/18.556601.

Lewicki, Michael S., and Terrence J. Sejnowski. 2000. “Learning Overcomplete Representations.” Neural Computation 12 (2): 337–65. https://doi.org/10.1162/089976600300015826.

Lewicki, M S, and T J Sejnowski. 1999. “Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations.” In NIPS, 11:730–36. Denver, CO: MIT Press. https://papers.cnl.salk.edu/PDFs/Coding%20Time-Varying%20Signals%20Using%20Sparse,%20Shift-Invariant%20Representations%201999-3580.pdf.

Liu, Tongliang, Dacheng Tao, and Dong Xu. 2016. “Dimensionality-Dependent Generalization Bounds for $k$-Dimensional Coding Schemes,” January. http://arxiv.org/abs/1601.00238.

Liu, T., and D. Tao. 2015. “On the Performance of Manhattan Nonnegative Matrix Factorization.” IEEE Transactions on Neural Networks and Learning Systems PP (99): 1–1. https://doi.org/10.1109/TNNLS.2015.2458986.

Mailhé, Boris, Rémi Gribonval, Pierre Vandergheynst, and Frédéric Bimbot. 2011. “Fast Orthogonal Sparse Approximation Algorithms over Local Dictionaries.” Signal Processing, Advances in Multirate Filter Bank Structures and Multiscale Representations, 91 (12): 2822–35. https://doi.org/10.1016/j.sigpro.2011.01.004.

Mairal, Julien, Francis Bach, and Jean Ponce. 2014. “Sparse Modeling for Image and Vision Processing.” Foundations and Trends® in Comput Graph. Vis. 8 (2-3): 85–283. https://doi.org/10.1561/0600000058.

Mairal, Julien, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. “Online Dictionary Learning for Sparse Coding.” In Proceedings of the 26th Annual International Conference on Machine Learning, 689–96. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553463.

———. 2010. “Online Learning for Matrix Factorization and Sparse Coding.” The Journal of Machine Learning Research 11: 19–60. http://arxiv.org/abs/0908.0050.

Mallat, Stephane G. 1989. “Multiresolution Approximations and Wavelet Orthonormal Bases of L²(R).” Transactions of the American Mathematical Society 315 (1): 69–87. https://doi.org/10.1090/S0002-9947-1989-1008470-5.

Mallat, Stéphane G., and Zhifeng Zhang. 1993. “Matching Pursuits with Time-Frequency Dictionaries.” IEEE Transactions on Signal Processing 41 (12): 3397–3415. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=258082.

Mallat, S., and Z. Zhang. 1992. “Adaptive Time-Frequency Decomposition with Matching Pursuits.” In Time-Frequency and Time-Scale Analysis, 1992., Proceedings of the IEEE-SP International Symposium, 7–10. https://doi.org/10.1109/TFTSA.1992.274245.

Marcus, Gary, Adam Marblestone, and Thomas Dean. 2014. “The Atoms of Neural Computation.” Science 346 (6209): 551–52. https://doi.org/10.1126/science.1261661.

Mlynarski, Wiktor. 2013. “Sparse, Complex-Valued Representations of Natural Sounds Learned with Phase and Amplitude Continuity Priors.” arXiv Preprint arXiv:1312.4695. http://arxiv.org/abs/1312.4695.

Mondal, Debashis, and Donald B. Percival. 2010. “M-Estimation of Wavelet Variance.” Annals of the Institute of Statistical Mathematics 64 (1): 27–53. https://doi.org/10.1007/s10463-010-0282-9.

Mørup, Morten, Mikkel N. Schmidt, and Lars K. Hansen. 2007. “Shift Invariant Sparse Coding of Image and Music Data.” Journal of Machine Learning Research. http://www.imm.dtu.dk/pubdb/views/edoc_download.php/5378/pdf/imm5378.pdf.

Ngiam, Jiquan, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, and Andrew Y. Ng. 2011. “Sparse Filtering.” In Advances in Neural Information Processing Systems 24, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 1125–33. Curran Associates, Inc. http://papers.nips.cc/paper/4334-sparse-filtering.pdf.

Olshausen, B. A., and D. J. Field. 1996. “Natural Image Statistics and Efficient Coding.” Network (Bristol, England) 7 (2): 333–39. https://doi.org/10.1088/0954-898X/7/2/014.

Olshausen, Bruno A., and David J. Field. 1996. “Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images.” Nature 381 (6583): 607–9. https://doi.org/10.1038/381607a0.

Olshausen, Bruno A, and David J Field. 2004. “Sparse Coding of Sensory Inputs.” Current Opinion in Neurobiology 14 (4): 481–87. https://doi.org/10.1016/j.conb.2004.07.007.

Opsomer, Jean, Yuedong Wang, and Yuhong Yang. 2001. “Nonparametric Regression with Correlated Errors.” Statistical Science 16 (2): –134–53. http://www.jstor.org/stable/2676791.

Oyallon, Edouard, Eugene Belilovsky, and Sergey Zagoruyko. 2017. “Scaling the Scattering Transform: Deep Hybrid Networks.” arXiv Preprint arXiv:1703.08961. https://arxiv.org/abs/1703.08961.

Pfister, Luke, and Yoram Bresler. 2017. “Automatic Parameter Tuning for Image Denoising with Learned Sparsifying Transforms.” In. http://lukepfister.me/assets/pdf/Pfister2017.pdf.

Plumbley, Mark D., Samer A. Abdallah, Thomas Blumensath, and Michael E. Davies. 2006. “Sparse Representations of Polyphonic Music.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing, 86 (3): 417–31. https://doi.org/10.1016/j.sigpro.2005.06.007.

Qian, Shie, and Dapang Chen. 1994. “Signal Representation Using Adaptive Normalized Gaussian Functions.” Signal Processing 36 (1): 1–11. https://doi.org/10.1016/0165-1684(94)90174-0.

Ravishankar, Saiprasad, and Yoram Bresler. 2015. “Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI,” January. http://arxiv.org/abs/1501.02923.

Rubinstein, Ron, A. M. Bruckstein, and Michael Elad. 2010. “Dictionaries for Sparse Representation Modeling.” Proceedings of the IEEE 98 (6): 1045–57. https://doi.org/10.1109/JPROC.2010.2040551.

Rubinstein, Ron, Michael Zibulevsky, and Michael Elad. 2008. “Efficient Implementation of the K-SVD Algorithm Using Batch Orthogonal Matching Pursuit.” CS Technion. http://pdf.aminer.org/000/322/616/efficient_computation_for_sequential_forward_observation_selection_in_image_reconstruction.pdf.

Shen, Z. 2010. “Wavelet Frames and Image Restorations.” In Scopus, 2834–63. World Scientific. http://www.mathunion.org/ICM/ICM2010.4/Main/icm2010.4.2834.2863.pdf.

Simoncelli, Eero P, and Bruno A Olshausen. 2001. “Natural Image Statistics and Neural Representation.” Annual Review of Neuroscience 24 (1): 1193–1216. https://doi.org/10.1146/annurev.neuro.24.1.1193.

Smith, Evan C., and Michael S. Lewicki. 2006. “Efficient Auditory Coding.” Nature 439 (7079): 978–82. https://doi.org/10.1038/nature04485.

Soh, Yong Sheng, and Venkat Chandrasekaran. 2017. “A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers,” January. http://arxiv.org/abs/1701.01207.

Torrence, Christopher, and Gilbert P Compo. 1998. “A Practical Guide to Wavelet Analysis.” Bulletin of the American Meteorological Society 79 (1): 61–78. http://shadow.eas.gatech.edu/~kcobb/seminar/torrence%26compo98.pdf.

Tošić, Ivana, and Pascal Frossard. 2011. “Dictionary Learning: What Is the Right Representation for My Signal?” IEEE Signal Processing Magazine 28 (2): 27–38. https://doi.org/10.1109/MSP.2010.939537.

Tropp, J. A., and S. J. Wright. 2010. “Computational Methods for Sparse Solution of Linear Inverse Problems.” Proceedings of the IEEE 98 (6): 948–58. https://doi.org/10.1109/JPROC.2010.2044010.

Tsaig, Yaakov, and David L. Donoho. 2006a. “Breakdown of Equivalence Between the Minimal -Norm Solution and the Sparsest Solution.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing, 86 (3): 533–48. https://doi.org/10.1016/j.sigpro.2005.05.028.

———. 2006b. “Extensions of Compressed Sensing.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing, 86 (3): 549–71. https://doi.org/10.1016/j.sigpro.2005.05.029.

Türkmen, Ali Caner. 2015. “A Review of Nonnegative Matrix Factorization Methods for Clustering,” July. http://arxiv.org/abs/1507.03194.

Vainsencher, Daniel, Shie Mannor, and Alfred M. Bruckstein. 2011. “The Sample Complexity of Dictionary Learning.” Journal of Machine Learning Research 12 (Nov): 3259–81. http://www.jmlr.org/papers/v12/vainsencher11a.html.

Vetterli, Martin. 1999. “Wavelets: Approximation and Compression–a Review.” In AeroSense’99, 3723:28–31. International Society for Optics and Photonics. https://doi.org/10.1117/12.342945.

Wang, Yu Guang, and Houying Zhu. 2017. “Localized Tight Frames and Fast Framelet Transforms on the Simplex,” January. http://arxiv.org/abs/1701.01595.

Wang, Yu Guang, and Xiaosheng Zhuang. 2016. “Tight Framelets and Fast Framelet Transforms on Manifolds,” August. http://arxiv.org/abs/1608.04026.

Wang, Yu-Xiang, Alex Smola, and Ryan J. Tibshirani. 2014. “The Falling Factorial Basis and Its Statistical Applications.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, 730–38. ICML’14. Beijing, China: JMLR.org. http://arxiv.org/abs/1405.0558.

Weidmann, Claudio, and Martin Vetterli. 2012. “Rate Distortion Behavior of Sparse Sources.” IEEE Transactions on Information Theory 58 (8): 4969–92. https://doi.org/10.1109/TIT.2012.2201335.

Wohlberg, Brendt. 2017. “SPORCO: A Python Package for Standard and Convolutional Sparse Representations.” In.

Yaghoobi, M., L. Daudet, and M. E. Davies. 2009. “Parametric Dictionary Design for Sparse Coding.” IEEE Transactions on Signal Processing 57 (12): 4800–4810. https://doi.org/10.1109/TSP.2009.2026610.

Yaghoobi, M., Sangnam Nam, R. Gribonval, and M. E. Davies. 2013. “Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling.” IEEE Transactions on Signal Processing 61 (9): 2341–55. https://doi.org/10.1109/TSP.2013.2250968.

Yuan, Xiaotong, Ping Li, and Tong Zhang. 2014. “Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, 127–35. Beijing, China: JMLR.org. http://proceedings.mlr.press/v32/yuan14.html.

Zhuang, Xiaosheng. 2016. “Digital Affine Shear Transforms: Fast Realization and Applications in Image/Video Processing.” SIAM Journal on Imaging Sciences 9 (3): 1437–66. https://doi.org/10.1137/15M1048318.