Neural nets with basis decomposition layers

Neural networks incorporating basis decompositions.

Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for undersstanding a partial differential equation? Or something else?

Neural networks with continuous basis functions

Closer to my own interests: Can I learn neural networks which are grid free, i.e. which can be resampled? Can I uses continuous bases in the computation of a neural net? This is very useful in things like learning PDEs. The virtue of these things is that they do not depend (much?) upon the scale of some grid. Possibly this naturally leads to us being able to sample the problem very sparsely. It also might allow us to interpolate sparse solutions. In addition, analytic basis functions are easy to differentiate; we can use autodiff to find their local gradients, even deep ones.

There are various ways other to do native interpolation; One hack uses the implicit representation method which is clever, but not plausible for my purposes, where something better behaved like a basis function interpretation is more helpful.

Specifically, I would like to do Bayesian inference which looks extremely hard through an implicit net, but only very hard through a basis decomposition.

In practice, how would I do this?

Using a well-known basis, such as orthogonal polynomial or Fourier bases, creating a layer which encodes your net is easy. After all, that is just an inner product. That is what methods like that of Li et al. (2020) exploit.

More general bases such as sparse/overcomplete frames might need to solve a complicated sparse optimisation problem inside the network.

One tool of use here might be to wrap implicit layers.1 Differentiable Convex Optimization Layers introduces cvxpylayers; perhaps that does some of the work we want?

I would probably not attempt to learn an arbitrary sparse basis dictionary in this context, because that does not interpolate naturally, but I can imagine learning a parametric sparse dictionary, such as one defined by some simple basis such as decaying sinusoids.

Somewhere in between there are wavelet decompositions. Are they useful to me? Not sure.

Convolutional neural networks as sparse coding

Elad and Papyan and others have a miniature school of Deep Learning analysis based on Multi Layer Convolutional Sparse Coding (Papyan, Romano, and Elad 2017; Papyan et al. 2018; Papyan, Sulam, and Elad 2017; Sulam et al. 2018). This combines sparse basis learning with neural nets, which is cool.

They argue:

The recently proposed multilayer convolutional sparse coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of convolutional neural networks (CNNs). Under this framework, the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors from a given input signal. …Our work represents a bridge between matrix factorization, sparse dictionary learning, and sparse autoencoders, and we analyze these connections in detail.

However, as interesting as this sounds, I am not deeply engaged with it, since this does not solve any immediate problems for me.


Aberdam, Aviad, Jeremias Sulam, and Michael Elad. 2019. “Multi-Layer Sparse Coding: The Holistic Way.” SIAM Journal on Mathematics of Data Science 1 (1): 46–77.
Arora, Sanjeev, Rong Ge, Tengyu Ma, and Ankur Moitra. 2015. “Simple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of The 28th Conference on Learning Theory, 40:113–49. Paris, France: PMLR.
Barron, Andrew R. 1994. “Approximation and Estimation Bounds for Artificial Neural Networks.” Machine Learning 14 (1): 115–33.
Bradley, David M., and J. Andrew Bagnell. 2008. “Differentiable Sparse Coding.” In Proceedings of the 21st International Conference on Neural Information Processing Systems, 113–20. NIPS’08. Red Hook, NY, USA: Curran Associates Inc.
Chi, Lu, Borui Jiang, and Yadong Mu. 2020. “Fast Fourier Convolution.” In Advances in Neural Information Processing Systems. Vol. 33.
Knudson, Karin C, Jacob Yates, Alexander Huk, and Jonathan W Pillow. 2014. “Inferring Sparse Representations of Continuous Signals with Continuous Orthogonal Matching Pursuit.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 27:1215–23. Curran Associates, Inc.
Li, Zongyi, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. 2020. “Fourier Neural Operator for Parametric Partial Differential Equations.” October 17, 2020.
Liu, Xiao, Kyongmin Yeo, and Siyuan Lu. 2020. “Statistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes.” Journal of the American Statistical Association 0 (0): 1–18.
Papyan, Vardan, Yaniv Romano, and Michael Elad. 2017. “Convolutional Neural Networks Analyzed via Convolutional Sparse Coding.” The Journal of Machine Learning Research 18 (1): 2887–2938.
Papyan, Vardan, Yaniv Romano, Jeremias Sulam, and Michael Elad. 2018. “Theoretical Foundations of Deep Learning via Sparse Representations: A Multilayer Sparse Model and Its Connection to Convolutional Neural Networks.” IEEE Signal Processing Magazine 35 (4): 72–89.
Papyan, Vardan, Jeremias Sulam, and Michael Elad. 2017. “Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding.” IEEE Transactions on Signal Processing 65 (21): 5687–5701.
Rackauckas, Christopher. 2019. “The Essential Tools of Scientific Machine Learning (Scientific ML).” The Winnower, August.
Sulam, Jeremias, Aviad Aberdam, Amir Beck, and Michael Elad. 2020. “On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (8): 1968–80.
Sulam, Jeremias, Vardan Papyan, Yaniv Romano, and Michael Elad. 2018. “Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning.” IEEE Transactions on Signal Processing 66 (15): 4090–4104.

  1. Not to be confused with implicit representation layers which are completely different.↩︎

Warning! Experimental comments system! If is does not work for you, let me know via the contact form.

No comments yet!

GitHub-flavored Markdown & a sane subset of HTML is supported.