On the tension between the representation of functions in function space and in weight space in neural networks. We “see” the outputs of neural networks as functions, generated by some inscrutable parameterization in terms of weights, which is more abstruse but also more tractable to learn in practice. Why might that be?
When we can learn in function space many things work better in various senses (see, e.g. GP regression), but such methods rarely dominate in messy practice. Why might that be? When can we operate in function space? Sometimes we really want to, e.g. in operator learning.
See also low rank GPs, partially Bayes NNs, neural tangent kernels, functional regression, functional inverse problems, overparameterization, wide limits of NNs…
References
Benjamin, Rolnick, and Kording. 2019.
“Measuring and Regularizing Networks in Function Space.” arXiv:1805.08289 [Cs, Stat].
Bunker, Girolami, Lambley, et al. 2024.
“Autoencoders in Function Space.”
Dupont, Kim, Eslami, et al. 2022.
“From Data to Functa: Your Data Point Is a Function and You Can Treat It Like One.” In
Proceedings of the 39th International Conference on Machine Learning.
Fortuin. 2022.
“Priors in Bayesian Deep Learning: A Review.” International Statistical Review.
Louizos, Shi, Schutte, et al. 2019.
“The Functional Neural Process.” In
Advances in Neural Information Processing Systems.
Tran, Rossi, Milios, et al. 2022.
“All You Need Is a Good Functional Prior for Bayesian Deep Learning.” Journal of Machine Learning Research.
Watson, Lin, Klink, et al. 2020. “Neural Linear Models with Functional Gaussian Process Priors.” In.