# Random fields as stochastic differential equations

Precision vs covariance, fight!

October 12, 2020 — March 1, 2021

Bayes
dynamical systems
edge computing
linear algebra
probability
signal processing
state space models
statistics
stochastic processes
time series

$\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\sinc}{\operatorname{sinc}}$

The representation of certain random fields, especially Gaussian random fields as stochastic differential equations. This is the engine that makes filtering Gaussian processes go, and is also a natural framing for probabilistic spectral analysis.

I do not have much to say right now about this, but I am using it so watch this space.

## 1 Creating a stationary Markov SDE with desired covariance

The Gauss-Markov Random Field approach.

Warning: I’m taking crib notes for myself here, so I lazily switch between signal processing filter terminology and probabilist termonology. I assume Bochner’s and Yaglom’s Theorems as comprehensible methods for analysing covariance kernels.

Let’s start with stationary kernels. We consider an SDE $$f: \mathbb{R}\to\mathbb{R}$$ at stationarity. We will let its driving noise to be some Wiener process. We care concerned with deriving the parameters of the SDE such that it has a given stationary covariance function $$k$$.

If there are no zeros in the spectral density, then there are no poles in the inverse transfer function, and we can model it with an all-pole SDE. This includes all the classic Matérn functions. This is covered in J. Hartikainen and Särkkä (2010), and Lindgren, Rue, and Lindström (2011). Worked examples starting from a discrete time formulation are given in a tutorial introduction Grigorievskiy and Karhunen (2016).

More generally, (quasi-)periodic covariances have zeros and we need to find a full rational function approximation. Särkkä, Solin, and Hartikainen (2013) introduces one such method. Bolin and Lindgren (2011) explores a sligtly different class

Solin and Särkkä (2014) has a fancier method employing resonators a.k.a. filter banks, to address a concern of Steven Reece et al. (2014) that atomic spectral peaks in the Fourier transform are not well approximated by rational functions.

Bolin and Lindgren (2011) consider a general class of realisable systems, given by $\mathcal{L}_{1} X(\mathbf{s})=\mathcal{L}_{2} \mathcal{W}(\mathbf{s})$ for some linear operators $$\mathcal{L}_{1}$$ and $$\mathcal{L}_{2} .$$

In the case that $$\mathcal{L}_{1}$$ and $$\mathcal{L}_{2}$$ commute, this may be put in hierarchical form: \begin{aligned} \mathcal{L}_{1} X_{0}(\mathbf{s})&=\mathcal{W}(\mathbf{s})\\ X(\mathbf{s})&=\mathcal{L}_{2} X_{0}(\mathbf{s}). \end{aligned}

They explain

$$X(\mathbf{s})$$ is simply $$\mathcal{L}_{2}$$ applied to the solution one would get to if $$\mathcal{L}_{2}$$ was the identity operator.

They call this a nested PDE, although AFAICT you could also say ARMA. They are particularly interested in equations of this form: $\left(\kappa^{2}-\Delta\right)^{\alpha / 2} X(\mathbf{s})=\left(b+\mathbf{B}^{\top} \nabla\right) \mathcal{W}(\mathbf{s})$

The SPDE generating this class of models is $\left(\prod_{i=1}^{n_{1}}\left(\kappa^{2}-\Delta\right)^{\alpha_{i} / 2}\right) X(\mathbf{s})=\left(\prod_{i=1}^{n_{2}}\left(b_{i}+\mathbf{B}_{i}^{\top} \nabla\right)\right) \mathcal{W}(\mathbf{s})$

They show that spectral density for such an $$X(\mathbf{s})$$ is given by $S(\mathbf{k})=\frac{\phi^{2}}{(2 \pi)^{d}} \frac{\prod_{j=1}^{n_{2}}\left(b_{j}^{2}+\mathbf{k}^{\top} \mathbf{B}_{j} \mathbf{B}_{j}^{\top} \mathbf{k}\right)}{\prod_{j=1}^{n_{1}}\left(\kappa_{j}^{2}+\|\mathbf{k}\|^{2}\right)^{\alpha_{j}}}.$

## 2 Convolution representations

See stochastic convolution or pragmatically, assume Gaussianity and see Gaussian convolution processes.

## 3 Covariance representation

Suppose there is a linear SDE on domain $$\mathbb{R}^d$$ whose measure has the desired covariance structure, and ignore all questions of existence and convergence for now. We define terms of the driving noise $$\varepsilon$$ and a linear differential operator $$\mathcal{L}$$ such that $\mathcal{L}f(\mathbf{x})=\varepsilon(\mathbf{x}).$

Assume there is a Green’s function for the PDE, i.e. that for any $$\mathbf{s} \in\mathbb{R}^d$$ we may find a function $$G_\mathbf{s}(\mathbf{x})$$ such that $\mathcal{L}G_\mathbf{s}(\mathbf{x})=\delta_\mathbf{s}(\mathbf{x}).$

The solutions of the SDE, ignoring a whole bunch of existence stuff, are then given by the convolution of these Green’s functions with the driving noise, i.e. $$f(\mathbf{x}_p) \overset{\text{sorta}}{=}\int G_\mathbf{s}(\mathbf{x}_p)\varepsilon(\mathbf{s}) d \mathbf{s}.$$ We use this to find the covariance of the solutions in terms of inner products of these fundamental solutions. \begin{align*} k(\mathbf{x}_p, \mathbf{x}_q) &=\mathbb{E}[f(\mathbf{x}_p)f(\mathbf{x}_q)] \\ &=\mathbb{E}\left[\int G_\mathbf{s}(\mathbf{x}_p)\varepsilon(\mathbf{s}) d \mathbf{s} \int G_\mathbf{t}(\mathbf{x}_q)\varepsilon(\mathbf{t}) d \mathbf{t} \right] \\ &=\mathbb{E}\left[\iint G_\mathbf{s}(\mathbf{x}_p) G_\mathbf{t}(\mathbf{x}_q) \varepsilon(\mathbf{s}) \varepsilon(\mathbf{t}) d \mathbf{t} d \mathbf{s} \right] \\ &=\iint G_\mathbf{s}(\mathbf{x}_p) G_\mathbf{t}(\mathbf{x}_q) \mathbb{E}[\varepsilon(\mathbf{s}) \varepsilon(\mathbf{t})] d \mathbf{t} d \mathbf{s} \\ &=\iint G_\mathbf{s}(\mathbf{x}_p) G_\mathbf{t}(\mathbf{x}_q) \sigma^2_\varepsilon \delta_\mathbf{s} (\mathbf{t}) d \mathbf{t} d \mathbf{s} &\text{ whiteness}\\ &=\sigma^2_\varepsilon \int G_\mathbf{s}(\mathbf{x}_p) G_\mathbf{s}(\mathbf{x}_q) d \mathbf{s}\\ &=\sigma^2_\varepsilon \langle G_\cdot(\mathbf{x}_p), G_\cdot(\mathbf{x}_q)\rangle \end{align*}

After that, the question is, given a Greens function can you produce a linear operator that realises it?

For example, the arc-cosine kernel of order $$1$$ corresponding to the ReLU is \begin{align*} k(\mathbf{x}_p, \mathbf{x}_q) &= \frac{\sigma_\varepsilon^2 \Vert \mathbf{x}_p \Vert \Vert \mathbf{x}_q \Vert }{2\pi} \Big( \sin |\theta| + \big(\pi - |\theta| \big) \cos\theta \Big) \end{align*} so for Green’s functions inducing this to exist we would want \begin{align} \int G_\mathbf{s}(\mathbf{x}_p) G_\mathbf{s}(\mathbf{x}_q) d \mathbf{s} &=\frac{\Vert \mathbf{x}_p \Vert \Vert \mathbf{x}_q \Vert }{2\pi} \Big( \sin |\theta| + \big(\pi - |\theta| \big) \cos\theta \Big) \end{align} For this to work we would need $$G_\mathbf{s}(\mathbf{x})\propto\Vert \mathbf{x} \Vert.$$

## 4 Input measures

Warning: this is just a dump of some notes from a paper I was writing; It does not make much sense RN. The essential idea I want to get at is considering different enveloping strategies for the SDE; Enveloping the input noise, for example

Suppose $$\mathbf{x}_p, \mathbf{x}_q \in \mathbb{R}^d$$. The kernel satisfies \begin{aligned} k(\mathbf{x}_p, \mathbf{x}_q) = \sum_{j=1}^d \frac{\partial k}{\partial x_{pj}} x_{pj}. \end{aligned} Let $$f$$ denote the Gaussian process with covariance function $$k$$ and let $$\mathcal{F}_{\mu}[f]$$ denote the Fourier transform of $$f$$ with respect to the finite measure $$\mu$$. Let the Fourier transform of $$\mu$$ be denoted $$\mathcal{F}[\mu](\mathbf{\omega})=\int e^{-i \mathbf{\omega}^\top \mathbf{x}} \, \mu(\dd\mathbf{x})$$, so that $$\mathcal{F}_{\mu}[f]=\mathcal{F}[f(x)\partial_x \mu(x)]=\mathcal{F}[f(x)]\ast\mathcal{F} [ \mu].$$

We have \begin{aligned} \mathbb{E} | \mathcal{F}_{\mu}[f](\mathbf{\omega})|^2 &= \iint \mathbb{E}\big[ f(\mathbf{x}_p) f(\mathbf{x}_q) \big]\, e^{-i\mathbf{\omega}^\top(\mathbf{x}_p - \mathbf{x}_q)} \mu(\dd\mathbf{x}_p) \mu(\dd\mathbf{x}_q) \\ &= \iint k(\mathbf{x}_p, \mathbf{x}_q) \, e^{-i\mathbf{\omega}^\top(\mathbf{x}_p - \mathbf{x}_q)}\,\mu(\dd\mathbf{x}_p) \mu(\dd\mathbf{x}_q) \\ &= \iint \sum_{j=1}^d \frac{\partial k}{\partial x_{pj}} x_{pj} e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \mu(\dd\mathbf{x}_p) \, e^{i \mathbf{\omega}^\top \mathbf{x}_q} \,\mu(\dd\mathbf{x}_q) \\ &= \int \sum_{j=1}^d i \frac{\partial}{\partial \omega_j} \Bigg( \int \frac{\partial k}{\partial x_{pj}} e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p)\Bigg) e^{i \mathbf{\omega}^\top \mathbf{x}_q} \mu(\dd\mathbf{x}_q)\\ &= \mathcal{F}_{\mu}^{\mathbf{x}_q} \left[ \sum_{j=1}^d i \frac{\partial}{\partial \omega_j} \Bigg( \int \frac{\partial k}{\partial x_{pj}} e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p)\Bigg) \right]\\ &= \mathcal{F}_{\mu}^{\mathbf{x}_q} \left[ \sum_{j=1}^d i \frac{\partial}{\partial \omega_j} \Bigg( \mathcal{F}_{\mu}^{\mathbf{x}_p}\left[ \frac{\partial k}{\partial x_{pj}} \right]\Bigg) \right].\end{aligned} Then \begin{aligned} \mathbb{E} | \mathcal{F}_{\mu}[f](\mathbf{\omega})|^2 &= \int \sum_{j=1}^d i \frac{\partial}{\partial \omega_j} \Bigg( \int \frac{\partial k}{\partial x_{pj}} e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p) \, \Bigg) e^{i \mathbf{\omega}^\top \mathbf{x}_q}\mu(\dd\mathbf{x}_q)\\ &= -\int \sum_{j=1}^d \frac{\partial}{\partial \omega_j} \Bigg( \omega_j \int k(\mathbf{x}_p, \mathbf{x}_q) e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p) \, \Bigg)e^{i \mathbf{\omega}^\top \mathbf{x}_q}\mu(\dd\mathbf{x}_q) \\ &= -\sum_{j=1}^d \int \Bigg( \int k(\mathbf{x}_p, \mathbf{x}_q) e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p) \, \Bigg) e^{i \mathbf{\omega}^\top \mathbf{x}_q}\mu(\dd\mathbf{x}_q)\\ &\phantom{{}={}}-\int \Bigg( \omega_j \frac{\partial}{\partial \omega_j} \int k(\mathbf{x}_p, \mathbf{x}_q) e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p) \, \Bigg) e^{i \mathbf{\omega}^\top \mathbf{x}_q}\mu(\dd\mathbf{x}_q) \\ (d+1)\mathbb{E} | \mathcal{F}_{\mu}[f](\mathbf{\omega})|^2 &= -\int \Bigg( \omega_j \frac{\partial}{\partial \omega_j} \int k(\mathbf{x}_p, \mathbf{x}_q) e^{-i \mathbf{\omega}^\top \mathbf{x}_p} \, \mu(\dd\mathbf{x}_p) \, \Bigg) e^{i \mathbf{\omega}^\top \mathbf{x}_q}\mu(\dd\mathbf{x}_q) \end{aligned}

### 4.1$$\mu$$ is a hypercube

We assume that $$\mu$$ is invariant with respect to permutation of coordinates. If we aren’t being silly, that means a cartesian product of intervals $$I$$, $$\mu(A):=\operatorname{Leb}(A\cap I^d).$$ Let us go with $$I=[-1,1].$$ Then \begin{aligned} \mathcal{F}[\mu](\mathbf{\omega}) &=\prod_{j=1}^d \sinc \left( \frac{\omega_j}{4\pi}\right)\\ &=\prod_{j=1}^d \frac{\sin (\omega_j/2)}{\omega}\end{aligned} Also \begin{aligned} \sinc'x &=\frac{\cos \pi x - \sinc x}{x}.\end{aligned}

TBD.

### 4.3$$\mu$$ is an isotropic Gaussian

Suppose $$\mu$$ is an isotropic Gaussian of variance $$I\sigma^2$$ so that $$\dd \mu(\mathbf{x})=(2\pi)^{-d/2}\sigma^{-d}e^{-\sigma^2\mathbf{x}^\top\mathbf{x}/2}$$ and $$\mathcal{F}[\mu]=e^{-\sigma^2\mathbf{\omega}^\top\mathbf{\omega}/2}=(2\pi)^{-d/2}\sigma^{-d}\dd \mu(\mathbf{\omega}).$$

## 5 Without stationarity via Green’s functions

Suppose our SDE may be specified in terms of a Gaussian white driving noise with variance $$\sigma_w^2$$ and an impulse response function/Green’s function, $$g$$ such that \begin{aligned} f(x):=\int g(\mathbf{u},\mathbf{x})\dd w(\mathbf{u}).\end{aligned} We know that the kernel is an inner product kernel and therefore invariant to rotation about $$\mathbf{0},$$ i.e. for orthogonal $$Q$$, $$k(Q\mathbf{x}_p, Q\mathbf{x}_q)=k(\mathbf{x}_p, \mathbf{x}_q).$$ It follows that $$g(Q\mathbf{u}, Q\mathbf{x})=g(\mathbf{u}, \mathbf{x}).$$ In fact, we may write each in dot-product form, i.e. $$k(\mathbf{x}_p, \mathbf{x}_q)=k(\mathbf{x}_p\cdot \mathbf{x}_q)$$ and $$g(\mathbf{u}, \mathbf{x})=g(\mathbf{u}\cdot \mathbf{x}).$$ The kernel satisfies \begin{aligned} k(\mathbf{x}_p, \mathbf{x}_q) &= \mathbb{E}\left[\int g(\mathbf{u},\mathbf{x}_p)\dd w(\mathbf{u})\int g(\mathbf{v},\mathbf{x}_q)\dd w(\mathbf{v})\right]\\ &= \mathbb{E}\left[\iint g(\mathbf{u},\mathbf{x}_p) g(\mathbf{v},\mathbf{x}_q)\dd w(\mathbf{u})\dd w(\mathbf{v})\right]\\ &= \iint g(\mathbf{u},\mathbf{x}_p) g(\mathbf{v},\mathbf{x}_q) \sigma_w^2\delta(\mathbf{u},\mathbf{v})\dd \mathbf{v}\dd \mathbf{u}\\ &= \sigma_w^2\int g(\mathbf{u},\mathbf{x}_p) g(\mathbf{u},\mathbf{x}_q) \dd\mathbf{u}\end{aligned} Up to a scaling factor, the green’s function is simply the covariance kernel under the assumption that the driving noise is white.

Recalling $$k(\mathbf{x}_p, \mathbf{x}_q) = \mathbb{E}\big[ \psi(\mathbf{W}^\top \mathbf{x}_q) \psi(\mathbf{W}^\top \mathbf{x}_p) \big]= \mathbb{E}\big[ \psi(Z_p) \psi(Z_q) \big]$$ the Green’s function thus must satisfy \begin{aligned} \sigma_w^2\int g(\mathbf{u}\cdot\mathbf{x}_p) g(\mathbf{u}\cdot \mathbf{x}_q) \dd\mathbf{u} &= \mathbb{E}\big[ \psi(\mathbf{W}^\top \mathbf{x}_q) \psi(\mathbf{W}^\top \mathbf{x}_p) \big]\end{aligned} Now we need to see how this works for individual kernels.

## 6 References

Aasnaes, and Kailath. 1973. IEEE Transactions on Automatic Control.
Álvarez, Luengo, and Lawrence. 2013. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Antoulas, ed. 1991. Mathematical System Theory: The Influence of R. E. Kalman.
Bakka, Rue, Fuglstad, et al. 2018. WIREs Computational Statistics.
Bart, Gohberg, and Kaashoek. 1979. Minimal Factorization of Matrix and Operator Functions. Operator Theory, Advances and Applications, v. 1.
Berry, Giannakis, and Harlim. 2020. arXiv:2002.07928 [Physics, Stat].
Bolin. 2014. Scandinavian Journal of Statistics.
Bolin, and Kirchner. 2020. Journal of Computational and Graphical Statistics.
Bolin, and Lindgren. 2011. The Annals of Applied Statistics.
Borovitskiy, Terenin, Mostowsky, et al. 2020. arXiv:2006.10160 [Cs, Stat].
Bruinsma, and Turner. 2018. arXiv:1802.08167 [Stat].
Chang, Wilkinson, Khan, et al. 2020. “Fast Variational Learning in State-Space Gaussian Process Models.” In MLSP.
Curtain. 1975. SIAM Journal on Control.
Dowling, Sokół, and Park. 2021.
Dutordoir, Hensman, van der Wilk, et al. 2021. In arXiv:2105.04504 [Cs, Stat].
Duttweiler, and Kailath. 1973a. IEEE Transactions on Information Theory.
———. 1973b. IEEE Transactions on Information Theory.
E. 2017. Communications in Mathematics and Statistics.
Friedlander, Kailath, and Ljung. 1975. In 1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes.
Gevers, and Kailath. 1973. IEEE Transactions on Automatic Control.
Grigorievskiy, and Karhunen. 2016. In 2016 International Joint Conference on Neural Networks (IJCNN).
Grigorievskiy, Lawrence, and Särkkä. 2017. In arXiv:1610.08035 [Stat].
Hartikainen, J., and Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing.
Hartikainen, Jouni, and Särkkä. 2011. “Sequential Inference for Latent Force Models.” In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. UAI’11.
Hartikainen, Jouni, Seppänen, and Särkkä. 2012. In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.
Higdon, David. 1998. Environmental and Ecological Statistics.
Higdon, Dave. 2002. In Quantitative Methods for Current Environmental Issues.
Hildeman, Bolin, and Rychlik. 2019. arXiv:1906.00286 [Stat].
Huber. 2014. Pattern Recognition Letters.
Hu, and Steinsland. 2016. WIREs Computational Statistics.
Kailath, Thomas. 1971. “The Structure of Radon-Nikodym Derivatives with Respect to Wiener and Related Measures.” The Annals of Mathematical Statistics.
Kailath, T. 1971a. IEEE Transactions on Information Theory.
———. 1971b. In 1971 IEEE Conference on Decision and Control.
———. 1974. IEEE Transactions on Information Theory.
Kailath, T., and Duttweiler. 1972. IEEE Transactions on Information Theory.
Kailath, T., and Geesey. 1971. IEEE Transactions on Automatic Control.
———. 1973. IEEE Transactions on Automatic Control.
Kailath, T., Geesey, and Weinert. 1972. IEEE Transactions on Information Theory.
Kailath, T., and Weinert. 1975. IEEE Transactions on Information Theory.
Karvonen, and Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
Lee, Higdon, Calder, et al. 2005. Statistical Modelling.
Lindgren, Bolin, and Rue. 2021. arXiv:2111.01084 [Stat].
Lindgren, and Rue. 2015. Journal of Statistical Software.
Lindgren, Rue, and Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Ljung, and Kailath. 1976. IEEE Transactions on Information Theory.
Ljung, Kailath, and Friedlander. 1975. In 1975 IEEE Conference on Decision and Control Including the 14th Symposium on Adaptive Processes.
Meyer, Edwards, Maturana-Russel, et al. 2020. WIREs Computational Statistics.
Park. 1981. Pacific Journal of Mathematics.
Pluch. 2007. arXiv:math/0701323.
Rackauckas, Ma, Martensen, et al. 2020. arXiv.org.
Reece, Steven, Ghosh, Rogers, et al. 2014. The Journal of Machine Learning Research.
Reece, S., and Roberts. 2010. In 2010 13th International Conference on Information Fusion.
Rue, and Tjelmeland. 2002. Scandinavian Journal of Statistics.
Särkkä. 2011. In Artificial Neural Networks and Machine Learning – ICANN 2011. Lecture Notes in Computer Science.
Särkkä, Álvarez, and Lawrence. 2019. IEEE Transactions on Automatic Control.
Särkkä, and Piché. 2014. In 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).
Särkkä, and Solin. 2019. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks 10.
Särkkä, Solin, and Hartikainen. 2013. IEEE Signal Processing Magazine.
Scharf, Hooten, Johnson, et al. 2017. arXiv:1703.02112 [Stat].
Segall, Davis, and Kailath. 1975. IEEE Transactions on Information Theory.
Segall, and Kailath. 1976. IEEE Transactions on Information Theory.
Sigrist, Künsch, and Stahel. 2015a. Application/pdf. Journal of Statistical Software.
———. 2015b. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Solin, and Särkkä. 2013. Physical Review E.
———. 2014. In Artificial Intelligence and Statistics.
———. 2020. Statistics and Computing.
Tompkins, and Ramos. 2018. Proceedings of the AAAI Conference on Artificial Intelligence.
Weinert, Howard L., and Kailath. 1974. The Annals of Statistics.
Weinert, H. L., and Kailath. 1974. In 1974 IEEE Conference on Decision and Control Including the 13th Symposium on Adaptive Processes.
Whittle. 1963. “Stochastic-Processes in Several Dimensions.” Bulletin of the International Statistical Institute.
Wilkinson, Andersen, Reiss, et al. 2019. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Wilkinson, Särkkä, and Solin. 2021.
Wilson, Borovitskiy, Terenin, et al. 2020. In Proceedings of the 37th International Conference on Machine Learning.
Wilson, Borovitskiy, Terenin, et al. 2021. Journal of Machine Learning Research.