Path smoothness properties of stochastic processes

Continuity, differentiability and other smoothness properties

February 26, 2020 — September 2, 2021

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Figure 1

“When are the paths of a stochastic process continuous?” is a question one might like to ask. But we need to ask more precise questions than that, because things are complicated in probability land. If we are concerned about whether the paths sampled from the process are almost-surely continuous functions then we probably mean something like:

“Does the process \(\{\rv{f}(t)\}_t\) admit a modification such that \(t\mapsto \rv{f}(t)\) is a.e. Hölder-continuous with probability 1?” or some other such mouthful. There are many notions of continuity of stochastic processes. Continuous wrt what, with what probability etc? Feller-continuity etc. This notebook is not an exhaustive taxonomy, this is just a list of notions I need to remember. Commonly useful notions for a stochastic process \(\{\rv{f}(t)\}_{t\in T}\) include the following.

Continuity in probability:
\(\lim _{s \rightarrow t} \mathbb{P}\{|\rv{f}(t)-\rv{f}(s)| \geq \varepsilon\}=0, \quad\) for each \(t \in T\) and each \(\varepsilon>0.\)
Continuity in mean square, or \(L^{2}\) continuity:
\[ \lim _{s \rightarrow t} \mathbb{E}\left\{|\rv{f}(t)-\rv{f}(s)|^{2}\right\}=0, \quad \text { for each } t \in T. \]
Sample continuity:
\[ \mathbb{P}\left\{\lim _{s \rightarrow t}|\rv{f}(t)-\rv{f}(s)|=0, \text { for all } t \in T\right\}=1. \]

I have given the these as continuity properties for all \(t\in T,\) but they can also be considered pointwise for fixed \(t\). Since \(t\) is continuous this can lead to subtle problems with uncountable unions of events etc.

Jump processes show the difference between these. A Poisson process has paths which are not continuous with probability 1, but which are continuous in mean square and in probability.

1 Kolmogorov continuity theorem

The Kolmogorov continuity theorem gives us sufficient conditions for admitting a modification possessing a version which is Hölder of the process based on how rapidly moments of the process increments grow. Question: What gives us sufficient conditions? Lowther is good on this.

2 SDEs with rough paths

Despite the name, this is useful for smooth paths. See signatures and rough paths.

3 Connection to strong solutions of SDEs

TBD.

4 Continuity of Gaussian processes

Todo: Read Kanagawa et al. (2018) section 4, for the startling revelations:

… it is easy to show that a GP sample path \(\rv{f} \sim \mathcal{G P}(0, K)\) does not belong to the corresponding RKHS \(\mathcal{H}_{K}\) with probability 1 if \(\mathcal{H}_{K}\) is infinite dimensional… This implies that GP samples are “rougher”, or less regular, than RKHS functions … Note that this fact has been well known in the literature; see e.g., (Wahba 1990, 5) and (Lukić and Beder 2001 Corollary 7.1).

Let \(K\) be a positive definite kernel on a set \(\mathcal{X}\) and \(\mathcal{H}_{K}\) be its RKHS, and consider \(\rv{f} \sim \mathcal{G} \mathcal{P}(m, K)\) with \(m: \mathcal{X} \rightarrow \mathbb{R}\) satisfying \(m \in \mathcal{H}_{K} .\) Then if \(\mathcal{H}_{K}\) is infinite dimensional, then \(\rv{f} \in \mathcal{H}_{K}\) with probability \(0 .\) If \(\mathcal{H}_{K}\) is finite dimensional, then there is a version \(\tilde{\rv{f}}\) of \(\rv{f}\) such that \(\tilde{\rv{f}} \in \mathcal{H}_{K}\) with probability 1.

5 \(L^2\) derivatives of random fields

Figure 2

Robert J. Adler, Taylor, and Worsley (2016) defines \(L^2\) derivatives thus: Choose a point \(t \in \mathbb{R}^{d}\) and a sequence of \(k\) ‘directions’ \(t_{1}', \ldots, t_{k}'\) in \(\mathbb{R}^{d}\), and write these as \(t'=\left(t_{1}', \ldots, t_{k}'\right).\) From context I assume this means that these directions are supposed to have unit norm, \(\|t_j\|=1.\) We say that \(\rv{f}\) has a \(k\)-th order \(L^{2}\) partial derivative at \(t\), in the direction \(t'\), if the limit \[ D_{L^{2}}^{k} \rv{f}\left(t, t'\right) \triangleq \lim _{h_{1}, \ldots, h_{k} \rightarrow 0} \frac{1}{\prod_{j=1}^{k} h_{j}} \Delta^{k} \rv{f}\left(t, t', h\right) \] exists in mean square, where \(h=\left(h_{1}, \ldots, h_{k}\right)\). \(t_{j}\) is usually axis aligned, e.g. \(t_{j}=[\dots\, 0\, 1\,0\, \dots]^\top\). Here \(\Delta^{k} \rv{f}\left(t, t', h\right)\) is the symmetrized difference \[ \Delta^{k} \rv{f}\left(t, t', h\right)=\sum_{s \in\{0,1\}^{k}}(-1)^{k-\sum_{j=1}^{k} s_{j}} \rv{f}\left(t+\sum_{j=1}^{k} s_{j} h_{j} t_{j}'\right) \] and the limit is taken sequentially, i.e. first send \(h_{1}\to 0,\) then \(h_{2}\), etc.

That is a lot, so let us examine that for the special case of \(k=1\) and \(t_{1}=[1\,0\dots]^\top=:e_1.\) We choose a point \(t \in \mathbb{R}^{d}\) and a direction w.l.o.g. \(e_1.\) The symmetrised difference in this first order case becomes \[\begin{aligned} \Delta \rv{f}\left(t, e_1, h\right) &=\sum_{s \in\{0,1\}}(-1)^{1- s} \rv{f}\left(t+ s h_{j} e_1\right)\\ &=\rv{f}\left(t+ h_{j} e_1\right) - \rv{f}\left(t\right). \end{aligned}\] We say that \(\rv{f}\) has a first order \(L^{2}\) partial derivative at \(t\), in the direction \(e_1\), if the limit \[\begin{aligned} D_{L^{2}} \rv{f}\left(t, e_1\right) &= \lim _{h \rightarrow 0} \frac{1}{h} \Delta \rv{f}\left(t, t', h\right)\\ &= \lim _{h \rightarrow 0} \frac{\rv{f}\left(t+ h_{j} e_1\right) - \rv{f}\left(t\right)}{h} \end{aligned}\] exists in mean square. This should look like the usual first order (partial) derivative, just with the term mean-square thrown in front.

By choosing \(t^{\prime}=\left(e_{j_{1}}, \ldots, e_{j_{k}}\right)\), where \(e_{j}\) is the vector with \(j\) -th element 1 and all others zero, we can talk of the mean square partial derivatives of various orders \[ \frac{\partial^{k}}{\partial t_{j_{1}} \ldots \partial t_{j_{k}}} \rv{f}(t) \triangleq D_{L^{2}}^{k} \rv{f}\left(t,\left(e_{j_{1}}, \ldots, e_{j_{k}}\right)\right) \] of \(\rv{f}.\) Then we see that the covariance function of partial derivatives of a random field must, if it exists and is finite, be given by \[ \mathbb{E}\left\{\frac{\partial^{k} \rv{f}(s)}{\partial s_{j_{1}} \partial s_{j_{1}} \ldots \partial s_{j_{k}}} \frac{\partial^{k} \rv{f}(t)}{\partial t_{j_{1}} \partial t_{j_{1}} \ldots \partial t_{j_{k}}}\right\}=\frac{\partial^{2 k} K(s, t)}{\partial s_{j_{1}} \partial t_{j_{1}} \ldots \partial s_{j_{k}} \partial t_{j_{k}}}. \] Note that we have not assumed stationarity here, or Gaussianity, and still this process covariance function encodes a lot of information.

In the case that \(\rv{f}\) is stationary, we can use the spectral representation to analyze these derivatives. In this case, the corresponding variances have an interpretation in terms of spectral moments. We define the spectral moments \[ \omega_{j_{1} \ldots j_{N}} \triangleq \int_{\mathbb{R}^{N}} \omega_{1}^{j_{1}} \cdots \omega_{N}^{j_{N}} \nu(d \omega) \] for all multi-indices \(\left(j_{1}, \ldots, j_{N}\right)\) with \(j_{j} \geq 0\). Assuming that the underlying random field, and so the covariance function, are real valued, so that, as described above, stationarity implies that \(K(t)=K(-t)\) and \(\nu(A)=\nu(-A)\), it follows that the odd ordered spectral moments, when they exist, are zero; specifically, \[ \omega_{j_{1} \ldots j_{N}}=0 \quad \text { if } \sum_{j=1}^{N} j_{j} \text { is odd. } \]

For example, if \(\rv{f}\) has mean square partial derivatives of orders \(\alpha+\beta\) and \(\gamma+\delta\) for \(\alpha, \beta, \gamma, \delta \in\{0,1,2, \ldots\}\), then \[ \begin{aligned} \mathbb{E}\left\{\frac{\partial^{\alpha+\beta} \rv{f}(t)}{\partial^{\alpha} t_{j} \partial^{\beta} t_{k}} \frac{\partial^{\gamma+\delta} \rv{f}(t)}{\partial^{\gamma} t_{\ell} \partial^{\delta} t_{m}}\right\} &=\left.(-1)^{\alpha+\beta} \frac{\partial^{\alpha+\beta+\gamma+\delta}}{\partial^{\alpha} t_{j} \partial^{\beta} t_{k} \partial^{\gamma} t_{\ell} \partial^{\delta} t_{m}} K(t)\right|_{t=0} \\ &=(-1)^{\alpha+\beta} j^{\alpha+\beta+\gamma+\delta} \int_{\mathbb{R}^{N}} \omega_{j}^{\alpha} \omega_{k}^{\beta} \omega_{\ell}^{\gamma} \omega_{m}^{\delta} \nu(d \omega). \end{aligned} \] Note that although this equation seems to have some asymmetries in the powers, these disappear due to the fact that all odd ordered spectral moments, like all odd ordered derivatives of \(K\), are identically zero.

6 References

Adler, Robert J. 2010. The Geometry of Random Fields.
Adler, Robert J., and Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115.
Adler, Robert J, Taylor, and Worsley. 2016. Applications of Random Fields and Geometry Draft.
Bongers, and Mooij. 2018. From Random Differential Equations to Structural Causal Models: The Stochastic Case.” arXiv:1803.08784 [Cs, Stat].
Chevyrev, and Kormilitzin. 2016. A Primer on the Signature Method in Machine Learning.” arXiv:1603.03788 [Cs, Stat].
Kanagawa, Hennig, Sejdinovic, et al. 2018. Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences.” arXiv:1807.02582 [Cs, Stat].
Lukić, and Beder. 2001. Stochastic Processes with Sample Paths in Reproducing Kernel Hilbert Spaces.” Transactions of the American Mathematical Society.
Lyons, Terry J. 1998. Differential Equations Driven by Rough Signals.” Revista Matemática Iberoamericana.
Lyons, Terry. 2014. Rough Paths, Signatures and the Modelling of Functions on Streams.” arXiv:1405.4537 [Math, q-Fin, Stat].
Lyons, Terry J., Caruana, and Lévy. 2007. Differential Equations Driven by Rough Paths. Lecture Notes in Mathematics.
Lyons, Terry J., and Sidorova. 2005. Sound Compression: A Rough Path Approach.” In Proceedings of the 4th International Symposium on Information and Communication Technologies. WISICT ’05.
Pugachev, and Sinit︠s︡yn. 2001. Stochastic systems: theory and applications.
Teye. 2010. “Stochastic Invariance via Wong-Zakai Theorem.”
Wahba. 1990. Spline Models for Observational Data.