Learning Gaussian processes which map functions to functions

December 7, 2020 — January 15, 2025

Gaussian
generative
geometry
Hilbert space
how do science
kernel tricks
machine learning
PDEs
physics
regression
spatial
stochastic processes
time series

In which I discover how to learn operators via GPs. I suspect a lot of things break; what is a usable Gaussian distribution over a mapping between functions?

Figure 1

It might be handy here to revisit the notation for Bayesian nonparametrics, since we don’t get the same kind of setup as when the distributions in question are finitely parameterized. TBC

This is especially interesting when I wish to learn kernels that satisfy physical constraints.

1 Universal Kriging

Does universal kriging fit in this notebook? (Menafoglio, Secchi, and Rosa 2013) In this setting, our observations are function-valued and we wish to spatially interpolate them. TBC. Keyword: Hilbert-Kriging.

See Júlio Hoffimann’s Hilbert-Kriging lecture.

2 Hilbert-space valued GPs

TBC

3 Incoming

Thank Rafa for the tips about the following:

  • Quang (2022):

    In this work, we present formulations for regularized Kullback-Leibler and Rényi divergences via the Alpha Log-Determinant (Log-Det) divergences between positive Hilbert-Schmidt operators on Hilbert spaces in two different settings, namely (i) covariance operators and Gaussian measures defined on reproducing kernel Hilbert spaces (RKHS); and (ii) Gaussian processes with squared integrable sample paths. For characteristic kernels, the first setting leads to divergences between arbitrary Borel probability measures on a complete, separable metric space. We show that the Alpha Log-Det divergences are continuous in the Hilbert-Schmidt norm, which enables us to apply laws of large numbers for Hilbert space-valued random variables. As a consequence of this, we show that, in both settings, the infinite-dimensional divergences can be consistently and efficiently estimated from their finite-dimensional versions, using finite-dimensional Gram matrices/Gaussian measures and finite sample data, with {} sample complexities in all cases. RKHS methodology plays a central role in the theoretical analysis in both settings. The mathematical formulation is illustrated by numerical experiments.

  • Feldman–Hájek theorem

    In probability theory, the Feldman–Hájek theorem or Feldman–Hájek dichotomy is a fundamental result in the theory of Gaussian measures. It states that two Gaussian measures \(\mu\) and \(\nu\) on a locally convex space \(X\) are either equivalent measures or else mutually singular: there is no possibility of an intermediate situation in which, for example, \(\mu\) has a density with respect to \(\nu\) but not vice versa.

    Interesting corollary for our purposes: the reminder that Gaussian measures over infinite-dimensional Hilbert spaces do things that we would not suspect from their finite-dimensional analogues:

    dilating a Gaussian measure on an infinite-dimensional Hilbert space \(X\) (i.e. taking \(C_\nu = s C_\mu\) for some scale factor \(s \geq 0\)) always yields two mutually singular Gaussian measures, except for the trivial dilation with \(s = 1,\) since \((s^2 - 1) I\) is Hilbert–Schmidt only when \(s = 1.\)

4 References

Albert. 2019. Gaussian Processes for Data Fulfilling Linear Differential Equations.” Proceedings.
Álvarez, Luengo, and Lawrence. 2013. Linear Latent Force Models Using Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Bakka, Rue, Fuglstad, et al. 2018. Spatial Modeling with R-INLA: A Review.” WIREs Computational Statistics.
Batlle, Darcy, Hosseini, et al. 2023. Kernel Methods Are Competitive for Operator Learning.” SSRN Scholarly Paper.
Besginow, and Lange-Hegermann. 2024. Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations.” In Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22.
Bolin. 2016. Models and Methods for Random Fields in Spatial Statistics with Computational Efficiency from Markov Properties.
Bolin, and Wallin. 2021. Efficient Methods for Gaussian Markov Random Fields Under Sparse Linear Constraints.” In Advances in Neural Information Processing Systems.
Brault, d’Alché-Buc, and Heinonen. 2016. Random Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning.
Brault, Lim, and d’Alché-Buc. n.d. Scaling up Vector Autoregressive Models With Operator-Valued Random Fourier Features.
Brouard, Szafranski, and D’Alché-Buc. 2016. “Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels.” The Journal of Machine Learning Research.
Cotter, Dashti, and Stuart. 2010. Approximation of Bayesian Inverse Problems for PDEs.” SIAM Journal on Numerical Analysis.
Davison, and Ortiz. 2019. FutureMapping 2: Gaussian Belief Propagation for Spatial AI.” arXiv:1910.14139 [Cs].
Dutordoir, Saul, Ghahramani, et al. 2022. Neural Diffusion Processes.”
Gahungu, Lanyon, Álvarez, et al. 2022. Adjoint-Aided Inference of Gaussian Process Driven Differential Equations.” In.
Gulian, Frankel, and Swiler. 2022. Gaussian Process Regression Constrained by Boundary Value Problems.” Computer Methods in Applied Mechanics and Engineering.
Harkonen, Lange-Hegermann, and Raita. 2023. Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients.” In Proceedings of the 40th International Conference on Machine Learning.
Heinonen, and d’Alché-Buc. 2014. Learning Nonparametric Differential Equations with Operator-Valued Kernels and Gradient Matching.” arXiv:1411.5172 [Cs, Stat].
Henderson. 2023. PDE Constrained Kernel Regression Methods.”
Henderson, Noble, and Roustant. 2023. Characterization of the Second Order Random Fields Subject to Linear Distributional PDE Constraints.” Bernoulli.
Hennig, Osborne, and Girolami. 2015. Probabilistic Numerics and Uncertainty in Computations.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Hildeman, Bolin, and Rychlik. 2019. Joint Spatial Modeling of Significant Wave Height and Wave Period Using the SPDE Approach.” arXiv:1906.00286 [Stat].
Hu, and Steinsland. 2016. Spatial Modeling with System of Stochastic Partial Differential Equations.” WIREs Computational Statistics.
Hutchinson, Terenin, Borovitskiy, et al. 2021. Vector-Valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels.” In Advances in Neural Information Processing Systems.
Kadri, Duflos, Preux, et al. 2016. Operator-Valued Kernels for Learning from Functional Response Data.” The Journal of Machine Learning Research.
Kadri, Rakotomamonjy, Preux, et al. 2012. Multiple Operator-Valued Kernel Learning.” Advances in Neural Information Processing Systems.
Kim, Luettgen, Paynabar, et al. 2023. Physics-Based Penalization for Hyperparameter Estimation in Gaussian Process Regression.” Computers & Chemical Engineering.
Krämer, Schmidt, and Hennig. 2022. Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations.” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.
Lange-Hegermann. 2018. Algorithmic Linearly Constrained Gaussian Processes.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
———. 2021. Linearly Constrained Gaussian Processes with Boundary Conditions.” In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research.
Lian. 2007. Nonlinear Functional Models for Functional Responses in Reproducing Kernel Hilbert Spaces.” Canadian Journal of Statistics.
Lim, d’Alché-Buc, Auliac, et al. 2015. Operator-Valued Kernel-Based Vector Autoregressive Models for Network Inference.” Machine Learning.
Long, Wang, Krishnapriyan, et al. 2022. AutoIP: A United Framework to Integrate Physics into Gaussian Processes.”
Magnani, Krämer, Eschenhagen, et al. 2022. Approximate Bayesian Neural Operators: Uncertainty Quantification for Parametric PDEs.”
Magnani, Pförtner, Weber, et al. 2024. Linearization Turns Neural Operators into Function-Valued Gaussian Processes.”
Menafoglio, Secchi, and Rosa. 2013. A Universal Kriging Predictor for Spatially Dependent Functional Data of a Hilbert Space.” Electronic Journal of Statistics.
Micchelli, and Pontil. 2005. On Learning Vector-Valued Functions.” Neural Computation.
Minh. 2022. Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.” SIAM/ASA Journal on Uncertainty Quantification.
Mora, Yousefpour, Hosseinmardi, et al. 2024. Operator Learning with Gaussian Processes.”
Moss, Opolka, Dumitrascu, et al. 2022. Approximate Latent Force Model Inference.”
Perdikaris, Raissi, Damianou, et al. 2017. Nonlinear Information Fusion Algorithms for Data-Efficient Multi-Fidelity Modelling.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Phillips, Seror, Hutchinson, et al. 2022. Spectral Diffusion Processes.” In.
Quang. 2021a. Convergence and Finite Sample Approximations of Entropic Regularized Wasserstein Distances in Gaussian and RKHS Settings.”
———. 2021b. Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.”
———. 2022. Kullback-Leibler and Renyi Divergences in Reproducing Kernel Hilbert Space and Gaussian Process Settings.”
———. 2023. Entropic Regularization of Wasserstein Distance Between Infinite-Dimensional Gaussian Measures and Gaussian Processes.” Journal of Theoretical Probability.
Raissi, and Karniadakis. 2018. Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” Journal of Computational Physics.
Raissi, Perdikaris, and Karniadakis. 2017a. Inferring Solutions of Differential Equations Using Noisy Multi-Fidelity Data.” Journal of Computational Physics.
———. 2017b. Machine Learning of Linear Differential Equations Using Gaussian Processes.” Journal of Computational Physics.
———. 2018. Numerical Gaussian Processes for Time-Dependent and Nonlinear Partial Differential Equations.” SIAM Journal on Scientific Computing.
Ranftl. n.d. “Physics-Consistency of Infinite Neural Networks.”
Saha, and Balamurugan. 2020. Learning with Operator-Valued Kernels in Reproducing Kernel Krein Spaces.” In Advances in Neural Information Processing Systems.
Sigrist, Künsch, and Stahel. 2015. Stochastic Partial Differential Equation Based Modelling of Large Space-Time Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Zammit-Mangion, and Cressie. 2021. FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets.” Journal of Statistical Software.
Zhang, Zhen, Wang, and Nehorai. 2020. Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zhang, Haizhang, Xu, and Zhang. 2012. “Refinement of Operator-Valued Reproducing Kernels.” The Journal of Machine Learning Research.