# Machine learning for partial differential equations

May 15, 2017 — November 17, 2023

\(\newcommand{\solop}{\mathcal{G}^{\dagger}}\)

Using statistical or machine learning approaches to solve PDEs, and maybe even to perform inference through them. There are many approaches to ML learning of PDEs and I will document on an *ad hoc* basis as I need them. No claim is made to completeness.

TODO: To reduce proliferation of unclear symbols by introducing a specific example; which neural nets represent operators, which represent specific functions, between which spaces etc.

TODO: Harmonise the notation used in this section with subsections below; right now they match the papers’ notation but not each other.

TODO: should the intro section actually be filed under PDEs?

TODO: introduce a consistent notation for coordinate space, output spaces, and function space?

TODO: this is mostly Eulerian fluid flow models right now. Can we mention Lagrangian models at least?

## 1 Background

this section is a mess and I hate it now

Suppose we have a PDE defined over some input domain, which we presume is a time dimension, and some number of spatial dimensions. The PDE is specified by some differential operator \(\mathcal{D}\) and some *forcing* or *boundary condition* \(u\in \mathscr{U},\) as \[\mathcal{D}[f]=u.\] These functions map from some coordinate space \(C\) to some output space \(O\). }The first coordinate of the input space often has the special interpretation as time \(t\in \mathbb{R}\) and the subsequent coordinates are then spatial coordinate \(x\in D\subseteq \mathbb{R}^{d_{D}}\) where \(d_{D}=d_{C}-1.\) Sometimes we make this explicit by writing the time coordinate separately as \(f(t,x).\) A common case, concretely, is \(C=\mathbb{R} \times \mathbb{R}^2=\mathbb{R} \times D\) and \(O=\mathbb{R}.\) For each time \(t\in \mathbb{R}\) we assume the instantaneous solution \(f(t, \cdot)\) to be an element of some Banach space \(f\in \mathscr{A}\) of functions \(f(t, \cdot): D\to O.\) The overall solutions \(f: C\to O\) have their own Banach space \(\mathscr{F}\). More particularly, we might consider solutions a restricted time domain \(t\in [0,T]\) and some spatial domain \(D\subseteq \mathbb{R}^2\) where a solution is a function \(f\) that maps \([0,T] \times D \to \mathbb{R}.\) This would naturally model, say, a 2D height-field evolving over time.

We have thrown the term *Banach space* about without making it clear which one we mean. There are usually some implied smoothness properties and of course we would want to include some kind of metric to fully specify these spaces, but we gloss over that for now.

We have introduced one operator, the defining operator \(\mathcal{D}\) . Another that we think about a lot is the *PDE propagator* or *forward operator* \(\mathcal{P}_s,\) which produces a representation of the entire solution surface at some future moment, given current and boundary conditions. \[\mathcal{P}_s[f(t, \cdot)]=f( t+s, \cdot).\] We might also discuss a *solution operator* \[\solop:\begin{array}{l}\mathscr{U}\to\mathscr{F}\\ u\mapsto f\end{array}\] such that \[\mathcal{D}\left[\solop[u]\right]=u.\]

Handling all these weird, and presumably infinite-dimensional, function spaces \(\mathscr{A},\mathscr{U},\mathscr{F},\dots\) on a finite computer requires use to introduce a notion of *discretisation*. We need to find some finite-dimensional representations of these functions so that they can be computed in a finite machine. PDE solvers use various tricks to do that, and each one is its own research field. Finite difference approximations treat all the solutions as values on a grid, effectively approximating \(\mathscr{F}\) with some new space of functions $ ^2 ,$ or, if you’d like, in terms of “bar chart” basis functions. Finite element methods define the PDE over a more complicated indexing system of compactly-supported basis functions which form a mesh. Particle systems approximate PDEs with moving particle who define their own adaptive basis. If there is some other natural (preferably orthogonal) basis of functions on the solution surface we might use those, for example with the right structure the eigenfunctions of the defining operator might give us such a basis. Fourier bases are famous in this case.

A classic for neural nets is to learn a finite-difference approximation of the PDE on a grid of values and treat it as a convnet regression, and indeed the dynamical treatment of neural nets is based on that. For various practical reasons I would like to avoid requiring a grid on my input values as much as possible. For one thing, grid systems are memory intensive and need expensive GPUs. For another, it is hard to integrate observations at multiple resolutions into a gridded data system. For a third, the research field of image prediction is too crowded for easy publications. Thus, that will not be treated further.

A grid-free approach is graph networks that learn a topology and interaction system. This seems to naturally map on to PDEs of the kind that we usually solve by particle systems, e.g. fluid dynamics with immiscible substances. Nothing wrong with this idea *per se*, but it does not seem to be the most compelling approach to me for my domain of spatiotemporal prediction where we already know the topology and can avoid all the complicated bits of graph networks. So this I will also ignore for now.

There are a few options. For an overview of many other techniques see Physics-based Deep Learning by Philipp Holl, Maximilian Mueller, Patrick Schnell, Felix Trost, Nils Thuerey, Kiwon Um (Thuerey et al. 2021). Also, see Brunton and Kutz, Data-Driven Science and Engineering. (Brunton and Kutz 2019) covers related material; both go farther than mere PDEs and consider general scientific settings. Also, the seminar series by the authors of that latter book is a moving feast of the latest results in this area.

Here we look in depth mainly at two important ones.

One approach learns a network \(\hat{f}\in \mathscr{F}, \hat{f}: C \to O\) such that \(\hat{f}\approx f\) (Raissi, Perdikaris, and Karniadakis 2019). This is the annoyingly-named implicit representation trick. Another approach is used in networks like Li, Kovachki, Azizzadenesheli, Liu, Bhattacharya, et al. (2020b) which learn the forward operator \(\mathcal{P}_1: \mathscr{A}\to\mathscr{A}.\) When the papers mentioned talk about *operator learning*, this is the operator that they seem to mean per default.

This entire idea might seem weird if you are used to typical ML research. Unlike the usual neural network setting, we start by not trying to solve a statistical inference problem, where we have to learn an unknown prediction function from data, but we have a partially or completely known function (PDE solver) that we are trying to approximate with a more convenient substitute (a neural approximation to that PDE solver).

That approximant is not necessarily exciting as a PDE solver, in itself. Probably we could have implemented the reference PDE solver on the GPU, or tweaked it a little, and got a faster PDE solver. Identifying when we have a non-trivial speed benefit from training a Neuyral net to do a thing is a whole project in itself.

However, I would like it if the reference solvers were easier to differentiate through, and to construct posteriors with - what you might call tomography, or inverse problems. But note that we *still* do not need to use ML methods to day that. In fact, if I already know the PDE operator and am implementing it in any case, I could avoid the learning step and simply implement the PDE using an off-the-shelf differentiable solver, which would allow us to perform this inference.

Nonetheless, we might wish to learn to approximate a PDE, for whatever reason. Perhaps we do not know the governing equations precisely, or something like that. In my case it is that am required to match an industry-standard black-box solver that is not flexible, which is a common reason. YMMV.

There are several approaches to learning the dynamics of a PDE solver for given parameters.

## 2 Neural operator

Learning to predict the *next step given this step*. Think *image-to-image regression*. A whole topic in itself. See Neural operators.

## 3 The PINN lineage

This body of literature encompasses both *DeepONet* (‘operator learning’) and *PINN* (‘physics informed neural nets’) approaches. Distinctions TBD.

See PINNs.

## 4 Neural operator

Learning to predict the *next step given this step*. Think *image-to-image regression*. A whole topic in itself. See Neural operators.

## 5 Message passing methods

TBD

## 6 DeepONet

See operator learning.

## 7 Adversarial approaches

One approach I am less familiar with advocates (conditional) GAN models to simulate (conditional) latent distributions. I’m curious about these but they look more computationally expensive and specific than I need at the moment, so I’m filing for later (G. Bao et al. 2020; Yang, Zhang, and Karniadakis 2020; Zang et al. 2020).

A recent examples from fluid-flow dynamics (Chu et al. 2021) has particularly beautiful animations:

## 8 Advection-diffusion PDEs in particular

F. Sigrist, Künsch, and Stahel (2015b) finds a nice spectral representation of certain classes of stochastic PDE. These are extended in Liu, Yeo, and Lu (2020) to non-stationary operators. By being less generic, these come out with computationally convenient spectral representations.

## 9 Inverse problems

Tomography through PDEs.

## 10 As implicit representations

Many of these PDE methods effectively use the “implicit representation” trick, i.e. they produce networks that map from input coordinates to values of solutions at those coordinates. This means we share some interesting tools with those networks, such as position encodings. TBD.

## 11 Differentiable solvers

Suppose we are keen to devise yet another method that will do clever things to augment PDE solvers with ML somehow. To that end it would be nice to have a PDE solver that was not a completely black box but which we could interrogate for useful gradients. Obviously all PDE solvers *use* gradient information, but only some of them expose that to us as users; e.g. MODFLOW will give me a solution field but not the gradients of the field that were used to calculate that solution, neither spatial gradients nor the sensitivity of the parameters. In ML toolkits, accessing this information is easy.

TODO: define adjoint method etc.

OTOH, there is a lot of sophisticated work done by PDE solvers that is hard for ML toolkits to recreate. That is why PDE solvers are a thing.

Tools which combine both worlds, PDE solutions and ML optimisations, do exist; there are adjoint method systems for mainstream PDE solvers just as there are PDE solvers for ML frameworks. Let us list some of the options under differentiable PDE solvers.

## 12 Deep Ritz method

Fits here? (E, Han, and Jentzen 2017; E and Yu 2018; Müller and Zeinhofer 2020)

## 13 Datasets and training harnesses

As with more typical neural net applications, PDE emulators can be trained from datasets. Here are some

- Johns Hopkins Turbulence Databases (JHTDB)
- pdebench/PDEBench: PDEBench: An Extensive Benchmark for Scientific Machine Learning (Takamoto et al. 2022) (Disclaimer: I contributed significantly to this project)
- karlotness/nn-benchmark: An extensible benchmark suite to evaluate data-driven physical simulation (Otness et al. 2021)
- PDEArena

But if we have a simulator, we can run it *live* and generate data on the fly. Here is a tool to facilitate that.

Melissa is a file avoiding, fault tolerant and elastic framework, to run large scale sensitivity analysis (Melissa-SA) and large scale deep surrogate training (Melissa-DL) on supercomputers. With Melissa-SA, largest runs so far involved up to 30k core, executed 80 000 parallel simulations, and generated 288 TB of intermediate data that did not need to be stored on the file system …

Classical sensitivity analysis and deep surrogate training consist in running different instances of a simulation with different set of input parameters, store the results to disk to later read them back to train a Neural Network or to compute the required statistics. The amount of storage needed can quickly become overwhelming, with the associated long read time that makes data processing time consuming. To avoid this pitfall, scientists reduce their study size by running low resolution simulations or down-sampling output data in space and time.

Melissa (Fig. 1) bypasses this limitation by avoiding intermediate file storage. Melissa processes the data online (in transit) enabling very large scale data processing:

## 14 Tooling

### 14.1 Torchphysics

boschresearch/torchphysics/Tutorial: Understanding the structure of TorchPhysics

TorchPhysics is a Python library of (mesh-free) deep learning methods to solve differential equations. You can use TorchPhysics e.g. to

- solve ordinary and partial differential equations
- train a neural network to approximate solutions for different parameters
- solve inverse problems and interpolate external data
The following approaches are implemented using high-level concepts to make their usage as easy as possible:

- physics-informed neural networks (PINN)
- QRes
- the Deep Ritz method
- DeepONets and Physics-Informed DeepONets

### 14.2 DeepXDE

DeepXDE is a reference solver implementation for PINN and DeepONet (Lu et al. 2021).

Use DeepXDE if you need a deep learning library that

- solves forward and inverse partial differential equations (PDEs) via physics-informed neural network (PINN),
- solves forward and inverse integro-differential equations (IDEs) via PINN,
- solves forward and inverse fractional partial differential equations (fPDEs) via fractional PINN (fPINN),
- approximates functions from multi-fidelity data via multi-fidelity NN (MFNN),
- approximates nonlinear operators via deep operator network (DeepONet),
- approximates functions from a dataset with/without constraints.

You might need to moderate your expectations a little. I did, after that bold description. This is an impressive library, but as covered above, some of the types of problems that it can solve are more limited than one might hope upon reading the description. Think of it as a neural network library that handles *certain* PDE calculations and you will not go too far astray.

### 14.3 NeuralOperator

`neuraloperator`

is a comprehensive library for learning neural operators in PyTorch. It is the official implementation for Fourier Neural Operators and Tensorized Neural Operators.

### 14.4 Modulus

NVIDIA’s MODULUS (formerly SimNet) (Hennigh et al. 2020) has the full marketing muscle of NVIDIA behind it.

Not currently recommended, due to comically clunky distribution system and onerous licensing.

Notable clauses from the license:

- LIMITATIONS. Your license to use the Modulus Deliverables is restricted as follows:

- The Modulus Deliverables are licensed for you to develop services and applications only for their use in systems with NVIDIA GPUs.
- You may not reverse engineer, decompile or disassemble, or remove copyright or other proprietary notices from any portion of the Modulus Deliverables or copies of the Modulus Deliverables.
- Except as expressly provided in this license, you may not copy, sell, rent, sublicense, transfer, distribute, modify, or create derivative works of any portion of the Modulus Deliverables. For clarity, you may not distribute or sublicense the Modulus Deliverables as a stand-alone product.

They run just fine on google colab, but I am not sure if that is legal.

### 14.5 CliffordLayers

Surprising twist: Clifford algebras are useful for ML+PDEs.

microsoft/cliffordlayers/ CliffordLayers

We propose Geometric Clifford Algebra Networks (GCANs) that are based on symmetry group transformations using geometric (Clifford) algebras. GCANs are particularly well-suited for representing and manipulating geometric transformations, often found in dynamical systems. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the Pin(p,q,r) group. We then propose the concept of group action layers, which linearly combine object transformations using pre-specified group actions. Together with a new activation and normalization scheme, these layers serve as adjustable geometric templates that can be refined via gradient descent. Theoretical advantages are strongly reflected in the modeling of three-dimensional rigid body transformations as well as large-scale fluid dynamics simulations, showing significantly improved performance over traditional methods.

## 15 References

*arXiv:2005.12998 [Math]*.

*SIAM Journal on Scientific Computing*.

*Acta Numerica*.

*Proceedings of The 28th Conference on Learning Theory*.

*Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence*.

*Inverse Problems*.

*Proceedings of the National Academy of Sciences*.

*AIAA SCITECH 2022 Forum*.

*Journal of Nonlinear Science*.

*arXiv:2203.13760 [Physics]*.

*arXiv:2005.03180 [Cs, Math, Stat]*.

*Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Processes*.

*GAMM-Mitteilungen*.

*International Conference on Learning Representations*.

*Physical Review Fluids*.

*Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control*.

*Annual Review of Fluid Mechanics*.

*ACM Transactions on Graphics*.

*Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS)*.

*arXiv:2012.07244 [Cs]*.

*Proceedings of the National Academy of Sciences*.

*Annual Review of Fluid Mechanics*.

*Notices of the American Mathematical Society*.

*Communications in Mathematics and Statistics*.

*arXiv:2008.13333 [Cs, Math]*.

*Advances in Computational Mathematics*.

*Communications in Mathematics and Statistics*.

*Journal of Computational Physics*.

*arXiv:2010.10876 [Cs]*.

*arXiv Preprint arXiv:2106.13281*.

*arXiv Preprint arXiv:2007.04954*.

*Acta Numerica*.

*Computer Methods in Applied Mechanics and Engineering*.

*Fixed Point Theory*. Springer Monographs in Mathematics.

*IMA Journal of Numerical Analysis*.

*arXiv:2012.11857 [Cs, Math, Stat]*.

*Computer Methods in Applied Mechanics and Engineering*.

*Proceedings of the National Academy of Sciences*.

*arXiv:2012.07938 [Physics]*.

*Frontiers in Applied Mathematics and Statistics*.

*NeurIPS Workshop*.

*ICLR*.

*arXiv:2112.05309 [Cs]*.

*ACM Transactions on Graphics*.

*Networks & Heterogeneous Media*.

*The Journal of Machine Learning Research*.

*Nature Reviews Physics*.

*arXiv:2001.08055 [Physics, Stat]*.

*arXiv:1912.00873 [Physics, Stat]*.

*arXiv:1912.07443 [Physics, Stat]*.

*Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS)*.

*Proceedings of the National Academy of Sciences*.

*arXiv:1801.07337 [Physics]*.

*arXiv:2107.07562 [Cs, Math]*.

*arXiv:2108.08481 [Cs, Math]*.

*Journal of Machine Learning Research*.

*Advances in Neural Information Processing Systems*.

*IEEE Transactions on Neural Networks*.

*Canadian Journal of Statistics*.

*International Conference on Learning Representations*.

*Advances in Neural Information Processing Systems*.

*arXiv:2010.08895 [Cs, Math]*.

*Journal of the American Statistical Association*.

*Proceedings of the 35th International Conference on Machine Learning*.

*arXiv:1910.03193 [Cs, Stat]*.

*SIAM Review*.

*Journal of Computational Physics*.

*Journal of Open Source Software*.

*arXiv:2111.09880 [Physics]*.

*Probabilistic Engineering Mechanics*.

*The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation*.

*Reliability Engineering & System Safety*.

*Analysis and Applications*.

*SIAM Journal on Scientific Computing*.

*Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences*.

*Advances in Neural Information Processing Systems*.

*Physica D: Nonlinear Phenomena*.

*MIT Web Domain*.

*Journal of Computational Physics*.

*arXiv:2109.07573 [Physics]*.

*Environmental Modelling & Software*.

*Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS)*.

*Scientific Reports*.

*arXiv Preprint arXiv:2302.06594*.

*Advances in Neural Information Processing Systems*.

*NeurIPS*.

*arXiv:2203.10131 [Physics]*.

*Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020)*.

*Journal of Statistical Software*.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*.

*Journal of Computational Physics*.

*Statistics and Computing*.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*.

*arXiv:2006.15641 [Cs, Stat]*.

*Physics-Based Deep Learning*.

*arXiv:2007.00016 [Physics]*.

*Array*.

*arXiv:1701.07989 [Math]*.

*Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*. KDD ’20.

*Advances in Water Resources*.

*arXiv:2011.11955 [Cs, Math]*.

*Journal of Computational Physics*.

*SIAM Journal on Scientific Computing*.

*Geoscientific Model Development Discussions*.

*Journal of Computational Physics*.

*SIAM Journal on Scientific Computing*.

*Journal of Computational Physics*.

*International Conference on Machine Learning*.