A placeholder to collect articles on this idea (perhaps, collection of ideas) because, basically, the terminologies are not always obvious. No new insights yet.
If we want to use a kernel frequently to do some function-valued Gaussian process regression and the solutions should satisfy some partial differential equation, how might we encode that in the kernel itself? When is it worthwhile to do so? Closely related: learning physical operators.
When modelling physical systems, especially those governed by partial differential equations (PDEs), we often want to incorporate the underlying physical constraints directly into the kernel functions of reproducing kernel Hilbert spaces (RKHS). This approach ensures that the solutions not only fit the observed data but also adhere to known physical laws.
There are many methods that fit this description, and what works depends very much on which physical equations we are solving, on what domain, and so on.
The categories in this taxonomy are not mutually exclusive. I have not read the literature well enough to make claims about that. Some of them look similar.
I suspect there a several different concepts confounded here: kernels that generate solutions satisfying certain constrains in function space can do this in several ways. I think most of the kernels here induce Hilbert spaces such that functions drawn them are maps to vector field, e.g. in the case of a spatial field such that every is a solution to some differential equation . But there are probably some other approaches across all the references.
Gaussian Processes and Their Derivatives
An important property of Gaussian processes is that linear transformations of GPs remain GPs under broad circumstances. Specifically, the derivative of a GP is also a GP, provided the covariance function is sufficiently smooth. If is a GP with mean function and covariance function , then its derivative is a GP with mean and covariance , where:
This property allows us to incorporate differential operators into the GP framework, enabling us to encode PDE constraints directly into the kernel.
Latent Force Models
Latent force models are one of the earlier methods for integrating differential equations into GP regression (M. Álvarez, Luengo, and Lawrence 2009; M. A. Álvarez, Luengo, and Lawrence 2013; Moss et al. 2022). In LFMs, the idea is to model the unknown latent forces driving a system using GPs. These latent forces are then connected to the observed data through differential equations.
e.g. Consider a system governed by a linear ordinary differential equation (ODE):
Here, is the observed function, is a known constant, and is an unknown latent function modelled as a GP. By placing a GP prior on , we induce a GP prior on that inherently satisfies the ODE.
The function resides in the Sobolev space , which consists of functions whose first derivative is square-integrable over the interval ([0, T]).
Divergence-Free and Curl-Free Kernels
Some fun tricks are of special relevance to fluids; e.g. for kernels which imply divergence-free or curl-free fields, especially on the surface of a sphere (Narcowich, Ward, and Wright 2007; E. J. Fuselier, Shankar, and Wright 2016; E. J. Fuselier and Wright 2009).
E. Fuselier (2008) says:
Constructing divergence-free and curl-free matrix-valued RBFs is fairly simple. If is a scalar-valued function consider
If is an RBF, then these functions can be used to produce divergence-free and curl-free interpolants, respectively. We note that these are not radial functions, but because they are usually generated by an RBF , they are still commonly called “matrix-valued RBFs”.
AFAICT there is nothing RBF-specific; I think it works for any stationary kernel. Do we even need stationarity?
The functions produced by these kernels seem to reside in specific Sobolev spaces that respect the divergence-free or curl-free conditions. For instance, divergence-free vector fields in belong to the space:
On the Sphere
When dealing with fields on the surface of a sphere, such as global wind patterns, special considerations are required (E. J. Fuselier and Wright 2009). The construction of divergence-free and curl-free kernels on the sphere involves accounting for the manifold’s curvature and ensuring that the vector fields are tangent to the sphere’s surface.
For a scalar function defined on the sphere , divergence-free kernels can be constructed using surface differential operators. These kernels help model tangential vector fields that are essential in geophysical applications, i.e. on the surface of the planet.
Linearly-constrained, Operator-Valued Kernels
Operator-valued kernels extend the concept of scalar kernels to vector or function outputs. They are particularly handy when the physical constraints can be expressed as linear operators acting on functions (Lange-Hegermann 2018, 2021)
Consider a linear operator acting on a function . An operator-valued kernel can be designed such that:
where is the Dirac delta function. This approach ensures that functions drawn from the associated RKHS satisfy .
Implicit
Not quite sure what to call it, but Kian Ming A. Chai introduced us to Mora et al. (2024), which seems to be an interesting variant. Keyword match to Brouard, Szafranski, and D’Alché-Buc (2016).
References
Álvarez, Mauricio, Luengo, and Lawrence. 2009.
“Latent Force Models.” In
Artificial Intelligence and Statistics.
Álvarez, Mauricio A., Luengo, and Lawrence. 2013.
“Linear Latent Force Models Using Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Besginow, and Lange-Hegermann. 2024.
“Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations.” In
Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22.
Brouard, Szafranski, and D’Alché-Buc. 2016. “Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels.” The Journal of Machine Learning Research.
Cotter, Dashti, and Stuart. 2010.
“Approximation of Bayesian Inverse Problems for PDEs.” SIAM Journal on Numerical Analysis.
Gulian, Frankel, and Swiler. 2022.
“Gaussian Process Regression Constrained by Boundary Value Problems.” Computer Methods in Applied Mechanics and Engineering.
Harkonen, Lange-Hegermann, and Raita. 2023.
“Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients.” In
Proceedings of the 40th International Conference on Machine Learning.
Hutchinson, Terenin, Borovitskiy, et al. 2021.
“Vector-Valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels.” In
Advances in Neural Information Processing Systems.
Kadri, Duflos, Preux, et al. 2016.
“Operator-Valued Kernels for Learning from Functional Response Data.” The Journal of Machine Learning Research.
Krämer, Schmidt, and Hennig. 2022.
“Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations.” In
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.
Kübler, Muandet, and Schölkopf. 2019.
“Quantum Mean Embedding of Probability Distributions.” Physical Review Research.
Lange-Hegermann. 2018.
“Algorithmic Linearly Constrained Gaussian Processes.” In
Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
———. 2021.
“Linearly Constrained Gaussian Processes with Boundary Conditions.” In
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research.
Long, Wang, Krishnapriyan, et al. 2022.
“AutoIP: A United Framework to Integrate Physics into Gaussian Processes.” In
Proceedings of the 39th International Conference on Machine Learning.
Micchelli, and Pontil. 2005.
“On Learning Vector-Valued Functions.” Neural Computation.
Mora, Yousefpour, Hosseinmardi, et al. 2024.
“Operator Learning with Gaussian Processes.”
Moss, Opolka, Dumitrascu, et al. 2022.
“Approximate Latent Force Model Inference.”
Narcowich, Ward, and Wright. 2007.
“Divergence-Free RBFs on Surfaces.” Journal of Fourier Analysis and Applications.
Perdikaris, Raissi, Damianou, et al. 2017.
“Nonlinear Information Fusion Algorithms for Data-Efficient Multi-Fidelity Modelling.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Raissi, Perdikaris, and Karniadakis. 2017a.
“Inferring Solutions of Differential Equations Using Noisy Multi-Fidelity Data.” Journal of Computational Physics.
Ranftl. n.d. “Physics-Consistency of Infinite Neural Networks.”
Saha, and Balamurugan. 2020.
“Learning with Operator-Valued Kernels in Reproducing Kernel Krein Spaces.” In
Advances in Neural Information Processing Systems.
Sigrist, Künsch, and Stahel. 2015.
“Stochastic Partial Differential Equation Based Modelling of Large Space-Time Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Wang, Cockayne, and Oates. 2018.
“On the Bayesian Solution of Differential Equations.” In
38th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering.
Zhang, Zhen, Wang, and Nehorai. 2020.
“Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zhang, Haizhang, Xu, and Zhang. 2012. “Refinement of Operator-Valued Reproducing Kernels.” The Journal of Machine Learning Research.