Function space versus weight space in Neural Nets

2024-10-15 — 2024-10-15

algebra

approximation

Gaussian

generative

graphical models

Hilbert space

kernel tricks

machine learning

networks

optimization

probability

statistics

Suspiciously similar content

On the tension between the representation of functions in function space and in weight space in neural networks. We “see” the outputs of neural networks as functions, generated by some inscrutable parameterization in terms of weights, which is more abstruse but also more tractable to learn in practice. Why might that be?

When we can learn directly in function space many things work better in various senses (see, e.g. GP regression), but such methods rarely dominate in messy practice. Why might that be? When can we operate in function space? Sometimes we really want to, e.g. in operator learning. How can we translate between the two?

Singualr learning theory seems very interested in connecting the weight-space optima with the functions learned by a neural net.

1 References

Benjamin, Rolnick, and Kording. 2019. “Measuring and Regularizing Networks in Function Space.” arXiv:1805.08289 [Cs, Stat].

Bunker, Girolami, Lambley, et al. 2024. “Autoencoders in Function Space.”

Burt, Ober, Garriga-Alonso, et al. 2020. “Understanding Variational Inference in Function-Space.”

Dupont, Kim, Eslami, et al. 2022. “From Data to Functa: Your Data Point Is a Function and You Can Treat It Like One.” In Proceedings of the 39th International Conference on Machine Learning.

Fortuin. 2022. “Priors in Bayesian Deep Learning: A Review.” International Statistical Review.

Hairer, Stuart, and Voss. 2011. “Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods.” In.

Jeffares, Curth, and Schaar. 2024. “Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond.”

Kovachki, Li, Liu, et al. 2023. “Neural Operator: Learning Maps Between Function Spaces With Applications to PDEs.” Journal of Machine Learning Research.

Lim, Kovachki, Baptista, et al. 2023. “Score-Based Diffusion Models in Function Space.”

Lipton. 2016. “Stuck in a What? Adventures in Weight Space.” arXiv:1602.07320 [Cs].

Liu, Zhu, and Belkin. 2020. “On the Linearity of Large Non-Linear Models: When and Why the Tangent Kernel Is Constant.” In Advances in Neural Information Processing Systems.

Louizos, Shi, Schutte, et al. 2019. “The Functional Neural Process.” In Advances in Neural Information Processing Systems.

Mingard, Valle-Pérez, Skalse, et al. 2021. “Is SGD a Bayesian Sampler? Well, Almost.” Journal of Machine Learning Research.

Navon, Shamsian, Achituve, et al. 2023. “Equivariant Architectures for Learning in Deep Weight Spaces.”

Pielok, Bischl, and Rügamer. 2023. “Approximate Bayesian Inference with Stein Functional Variational Gradient Descent.” In.

Rudner, Chen, Teh, et al. 2022. “Tractable Function-Space Variational Inference in Bayesian Neural Networks.” In.

Sun, Zhang, Shi, et al. 2019. “Functional Variational Bayesian Neural Networks.” In.

Tran, Rossi, Milios, et al. 2022. “All You Need Is a Good Functional Prior for Bayesian Deep Learning.” Journal of Machine Learning Research.

Wang, Ren, Zhu, et al. 2018. “Function Space Particle Optimization for Bayesian Neural Networks.” In.

Watson, Lin, Klink, et al. 2020. “Neural Linear Models with Functional Gaussian Process Priors.” In.