Physics-informed neural networks
October 15, 2019 — October 19, 2023
Physics-informed neural networks (PINNs) are a particular way of using statistical or machine learning approaches to solve PDEs, and maybe even to perform inference through them, characterised by using the implicit representation to do the work.
AFAICT this is distinct from imposing the conservation law through the network structure to impose symmetries because here we merely penalize deviations.
This body of literature possibly also encompasses DeepONet approaches 🤷. Distinctions TBD.
TODO: To avoid proliferation of unclear symbols by introducing a specific example; which neural nets represent operators, which represent specific functions, between which spaces etc.
TODO: Harmonise the notation used in this section with subsections below; right now they match the papers’ notation but not each other.
TODO: should the intro section actually be filed under PDEs?
TODO: introduce a consistent notation for coordinate space, output spaces, and function space?
1 Deterministic PINN
Archetypally, the PINN. Recently these have been hip (Raissi, Perdikaris, and Karniadakis 2017a, 2017b; Raissi, Perdikaris, and Karniadakis 2019; L. Yang, Zhang, and Karniadakis 2020; Zhang, Guo, and Karniadakis 2020; Zhang et al. 2019). Zhang et al. (2019) credits Lagaris, Likas, and Fotiadis (1998) with originating the idea in 1998, so I suppose this is not super fresh. Thanks Shams Basir who has an earlier date for this. In Basir and Senocak (2022) credit goes to Dissanayake and Phan-Thien (1994) and van Milligen, Tribaldos, and Jiménez (1995). The key insight is that if we are elbows-deep in a neural network framework anyway, we already have access to automatic differentiation, so differential operations over the input field are basically free.
Let us introduce the basic “forward” PINN setup as given in Raissi, Perdikaris, and Karniadakis (2019): In the basic model we have the following problem
We define residual network
The approximation is data-driven, with sample set
The key insight is that if we are elbows-deep in a neural network framework anyway, we already have access to automatic differentiation, so differential operations over the input field are basically free.
We train by minimising a combined loss,
An example is illustrative. Here is the reference Tensorflow interpretation from Raissi, Perdikaris, and Karniadakis (2019) for the Burger’s equation. In one space dimension, the Burger’s equation with Dirichlet boundary conditions reads
The python implementation of these two parts is essentially a naïve transcription of those equations.
def f(t, x):
f = neural_net(tf.concat([t,x],1), weights, biases)
return f
def r(t, x):
f = f(t, x)
f_t = tf.gradients(f, t)[0]
f_x = tf.gradients(f, x)[0]
f_xx = tf.gradients(f_x, x)[0]
r = f_t + f∗f_x − (0.01/ tf.pi)∗f_xx
return r
Because the outputs are parameterised by coordinates, the built-in autodiff does all the work. The authors summarize the resulting network topology so:
What has this gained us? So far, we have acquired a model which can, the authors assert, solve deterministic PDEs, which is nothing we could not do before. We have sacrificed any guarantee that our method will in fact do well on data from outside our observations. Also, I do not understand how I can plug alternative initial or boundary conditions into this. There is no data input, as such, at inference time, merely coordinates. On the other hand, the authors assert that this is faster and more stable than traditional solvers. It has the nice feature that the solution is continuous in its arguments; there is no grid. As far as NN things go, the behaviour of this model is weird and refreshing: it is simple, requires small data, and has few tuning parameters.
But! what if we don’t know the parameters of the PDE? Assume the differential operator has parameter
Fine; now what? Two obvious challenges from where I am sitting.
- No way of changing inputs in the sense of initial or boundary conditions, without re-training the model
- Point predictions. No accounting for randomness or uncertainty.
2 Stochastic PINN
Zhang et al. (2019) address point 2 via chaos expansions to handle the PDE emulation as a stochastic process regression, which apparently gives us estimates of parametric and process uncertainty. All diagrams in this section come from that paper.
🏗️ Terminology warning: I have not yet harmonised the terminology of this section with the rest of the page.
The extended model adds a random noise parameter
The randomness in this could indicate a random coupling term, or uncertainty in some parameter of the model. Think of a Gaussian process prior over the forcing term of the PDE. We sample this noise parameter also and augment the data set with it, over
Note that I have kept the time variable explicit, unlike the paper, to match the previous section, but it gets cluttered if we continue to do this, so let’s suppress
So now we approximate
We let
Now we have approximated away the correlated
Next is where we use the chaos expansion trick to construct an interpolant. Suppose the measure of RV
We construct a polynomial basis which is orthogonal with respect to the inner product associated to this measure, specifically
OK, so we construct an orthonormal polynomial basis
So we are going to pick
Then, we can approximate
We construct two networks,
- the network
, which takes the coordinate as the input and outputs a vector of the aPC modes of evaluated at , and - the network
that takes the coordinate as the input and outputs a vector of the modes.
The resulting network topology is
For concreteness, here is the topology for an example problem
At inference time we take observations of
After all that I would describe this as a method to construct a stochastic PDE with the desired covariance structure, which is a hard thing to do. OK, all that was very complicated. Although, it was a complicated thing to do; Consider the mess this gets us into in the Karhunen Loéve expansion and spectral expansion Anyway, after all this, presuming the neural networks are perfect, we have a good estimate of the distribution of random parameters and random output of a stochastic PDE evaluated over the whole surface from partial discrete measurements.
How do we estimate the uncertainty introduced by the neural net? Dropout.
Further questions:
- Loss scale; gradient errors may not be comparable to value errors in the loss function.
- Network capacity: What size networks are necessary (not the ones we learn are tiny, with only hundreds of parameters)
- How do we generalize this to different initial conditions? Can we learn an observation-conditional PDE?
- After all this work it looks like I still can’t do inference on this thing. How do I update a distribution over
by this method from observations of a new PDE? - Notice how the parameter inference problem for
vanished for the stochastic PDE? Can we learn an estimate for , and simultaneously in this setting? I imagine we repeat the trick where that parameter is learned along with the network.
3 Weak formulation
TODO: should this be filed with PINNs?
A different network topology using the implicit representation trick is explored in Zang et al. (2020) and extended to inverse problems in Bao et al. (2020), They discuss this in terms of a weak formulation of a PDE.
🏗️ Terminology warning: I have not yet harmonised the terminology of this section with the rest of the page.
We start with the example second-order elliptic2 PDE with on domain
By multiplying both sides by a test function
The clever insight is that this inspires an adversarial problem to find the weak solutions, by considering the
Specifically the solutions
To train the deep neural network
This is a very elegant idea, although the implicit representation thing is still a problem for my use cases.
4 PINO
Combines operator learning with physics-informed neural networks. Li et al. (2021):
Code at neuraloperator/PINO.
5 Implementation
pytorch via TorchPhysics
Tensorflow
6 Incoming
- idrl-lab/PINNpapers: Must-read Papers on Physics-Informed Neural Networks.
- X. Chen et al. (2021) uses an attention mechanism to improve the performance of PINNs. I’m curious.
- Introduction to Scientific Machine Learning 2: Physics-Informed Neural Networks
- Introduction to Scientific Machine Learning through Physics-Informed Neural Networks