Neural nets with basis decomposition layers



Neural networks incorporating basis decompositions.

Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for understanding a partial differential equation? Or something else?

Unrolling: Implementing sparse coding using neural nets

Often credit to Gregor and LeCun (2010), this trick imagines each step in an iterative sparse coding optimisation as a layer in a neural net and then optimises the gradient descent step of that iterative coding, given you, in effect, a way of learning optimally fast, or optimally fast, sparse bases. This has been taken a long way by, e.g. Monga, Li, and Eldar (2021).

Convolutional neural networks as sparse coding

Elad and Papyan and others have a miniature school of Deep Learning analysis based on Multi Layer Convolutional Sparse Coding (Papyan, Romano, and Elad 2017; Papyan et al. 2018; Papyan, Sulam, and Elad 2017; Sulam et al. 2018). The argument here is that essentially Convnets are already solving sparse coding problems they just don’t know it. They argue:

The recently proposed multilayer convolutional sparse coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of convolutional neural networks (CNNs). Under this framework, the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors from a given input signal. …Our work represents a bridge between matrix factorization, sparse dictionary learning, and sparse autoencoders, and we analyze these connections in detail.

However, as interesting as this sounds, I am not deeply engaged with it, since this does not solve any immediate problems for me.

Continuous basis functions

Convnet require a complete rasterized grid, but often signels are not observed on a regular grid. This is precisely the problem of signal sampling. With basis functions of continuous support and a few assumptions it is tempting to imagine we can get neural networks which operate in a continuous space.. Can I uses continuous bases in the computation of a neural net? If so, this could be useful in things like learning PDEs. The virtue of these things is that they do not depend (much?) upon the scale of some grid. Possibly this naturally leads to us being able to sample the problem very sparsely. It also might allow us to interpolate sparse solutions. In addition, analytic basis functions are easy to differentiate; we can use autodiff to find their local spatial gradients, even deep ones.

There are various ways other to do native interpolation; One hack uses the implicit representation method which is a clever trick — in that setting we reuse the autodiff architecture to calculate gradients with respect to the output index, but not plausible for every problem, where something better behaved like a basis function interpretation is more helpful.

Specifically, I would like to do Bayesian inference which looks extremely hard through an implicit net, but only very hard through a basis decomposition.

In practice, how would I do this?

Using a well-known basis, such as orthogonal polynomial or Fourier bases, creating a layer which encodes your net is easy. After all, that is just an inner product. That is what methods like that of Li et al. (2020) exploit.

More general bases such as sparse/overcomplete frames might need to solve a complicated sparse optimisation problem inside the network.

One tool of use here might be to wrap implicit layers.1 Differentiable Convex Optimization Layers introduces cvxpylayers; perhaps that does some of the work we want?

I would probably not attempt to learn an arbitrary sparse basis dictionary in this context, because that does not interpolate naturally, but I can imagine learning a parametric sparse dictionary, such as one defined by some simple basis such as decaying sinusoids.

Somewhere in between there are wavelet decompositions. Are they useful to me? Not sure.

References

Aberdam, Aviad, Jeremias Sulam, and Michael Elad. 2019. “Multi-Layer Sparse Coding: The Holistic Way.” SIAM Journal on Mathematics of Data Science 1 (1): 46–77. https://doi.org/10.1137/18M1183352.
Arora, Sanjeev, Rong Ge, Tengyu Ma, and Ankur Moitra. 2015. “Simple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of The 28th Conference on Learning Theory, 40:113–49. Paris, France: PMLR. http://proceedings.mlr.press/v40/Arora15.html.
Barron, Andrew R. 1994. “Approximation and Estimation Bounds for Artificial Neural Networks.” Machine Learning 14 (1): 115–33. https://doi.org/10.1023/A:1022650905902.
Bradley, David M., and J. Andrew Bagnell. 2008. “Differentiable Sparse Coding.” In Proceedings of the 21st International Conference on Neural Information Processing Systems, 113–20. NIPS’08. Red Hook, NY, USA: Curran Associates Inc. https://doi.org/10.1184/R1/6552635.v1.
Chen, Xiaohan, Jialin Liu, Zhangyang Wang, and Wotao Yin. 2018. “Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 9079–89. NIPS’18. Red Hook, NY, USA: Curran Associates Inc. http://arxiv.org/abs/1808.10038.
Chi, Lu, Borui Jiang, and Yadong Mu. 2020. “Fast Fourier Convolution.” In Advances in Neural Information Processing Systems. Vol. 33. https://proceedings.neurips.cc//paper_files/paper/2020/hash/2fd5d41ec6cfab47e32164d5624269b1-Abstract.html.
Gregor, Karol, and Yann LeCun. 2010. “Learning fast approximations of sparse coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 399–406. http://yann.lecun.com/exdb/publis/pdf/gregor-icml-10.pdf.
———. 2011. “Efficient Learning of Sparse Invariant Representations.” May 26, 2011. http://arxiv.org/abs/1105.5307.
Knudson, Karin C, Jacob Yates, Alexander Huk, and Jonathan W Pillow. 2014. “Inferring Sparse Representations of Continuous Signals with Continuous Orthogonal Matching Pursuit.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 27:1215–23. Curran Associates, Inc. http://papers.nips.cc/paper/5264-inferring-sparse-representations-of-continuous-signals-with-continuous-orthogonal-matching-pursuit.pdf.
Lemhadri, Ismael, Feng Ruan, Louis Abraham, and Robert Tibshirani. 2021. LassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research 22 (127): 1–29. http://jmlr.org/papers/v22/20-848.html.
Li, Zongyi, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. 2020. “Fourier Neural Operator for Parametric Partial Differential Equations.” October 17, 2020. http://arxiv.org/abs/2010.08895.
Liu, Xiao, Kyongmin Yeo, and Siyuan Lu. 2020. “Statistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes.” Journal of the American Statistical Association 0 (0): 1–18. https://doi.org/10.1080/01621459.2020.1863223.
Monga, Vishal, Yuelong Li, and Yonina C. Eldar. 2021. “Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing.” IEEE Signal Processing Magazine 38 (2): 18–44. https://doi.org/10.1109/MSP.2020.3016905.
Oreshkin, Boris N., Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. 2020. “N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting.” February 20, 2020. http://arxiv.org/abs/1905.10437.
Papyan, Vardan, Yaniv Romano, and Michael Elad. 2017. “Convolutional Neural Networks Analyzed via Convolutional Sparse Coding.” The Journal of Machine Learning Research 18 (1): 2887–2938. http://arxiv.org/abs/1607.08194.
Papyan, Vardan, Yaniv Romano, Jeremias Sulam, and Michael Elad. 2018. “Theoretical Foundations of Deep Learning via Sparse Representations: A Multilayer Sparse Model and Its Connection to Convolutional Neural Networks.” IEEE Signal Processing Magazine 35 (4): 72–89. https://doi.org/10.1109/MSP.2018.2820224.
Papyan, Vardan, Jeremias Sulam, and Michael Elad. 2017. “Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding.” IEEE Transactions on Signal Processing 65 (21): 5687–5701. https://doi.org/10.1109/TSP.2017.2733447.
Rackauckas, Christopher. 2019. “The Essential Tools of Scientific Machine Learning (Scientific ML).” The Winnower, August. https://doi.org/10.15200/winn.156631.13064.
Sulam, Jeremias, Aviad Aberdam, Amir Beck, and Michael Elad. 2020. “On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (8): 1968–80. https://doi.org/10.1109/TPAMI.2019.2904255.
Sulam, Jeremias, Vardan Papyan, Yaniv Romano, and Michael Elad. 2018. “Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning.” IEEE Transactions on Signal Processing 66 (15): 4090–4104. https://doi.org/10.1109/TSP.2018.2846226.

  1. Not to be confused with implicit representation layers which are completely different.↩︎


No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.