# Rough path theory and signature methods

April 2, 2021 — April 30, 2024

control
dynamical systems
SDEs
signal processing
sparser than thou
statistics
stochastic processes
time series

I am not sure yet what this is. Do they mean rough in the sense of approximate or the sense of not smooth. Or maybe both?

Seems to originate in a fairly impenetrable body of work by Lyons, e.g. T. Lyons (1994) but modern recommendations are to read more approachable stuff. Friz and Hairer (2020), available free online, as an introduction, which covers the simplest (?) case of Gaussian noise.

## 1 Rough differential equations

Try Morrill et al. (2021) ?

## 2 Discrete approximation

Wong-Zakai approximations Twardowska (1996). (Martin Hairer recommendation.)

Possibly compact refs: .

## 3 In learning

Hodgkinson, Roosta, and Mahoney (2021) makes use of rough path integrals to justify learning by the adjoint method in stochastic differential equations. Cass and Salvi (2024) is a friendly introduction to this area.

## 4 Signatures

Chevyrev and Kormilitzin (2016) discusses path signatures in particular, which is something arising in the theory about which I know little. Bonnier et al. (2019) summarises:

When data is ordered sequentially then it comes with a natural path-like structure: the data may be thought of as a discretisation of a path $$X:[0,1] \rightarrow V$$, where $$V$$ is some Banach space. In practice we shall always take $$V=\mathbb{R}^d$$ for some $$d \in \mathbb{N}$$. For example the changing air pressure at a particular location may be thought of as a path in $$\mathbb{R}$$; the motion of a pen on paper may be thought of as a path in $$\mathbb{R}^2$$; the changes within financial markets may be thought of as a path in $$\mathbb{R}^d$$, with $$d$$ potentially very large.

Given a path, we may define its signature, which is a collection of statistics of the path. The map from a path to its signature is called the signature transform. Definition 1.1. Let $$\mathbf{x}=\left(x_1, \ldots, x_n\right)$$, where $$x_i \in \mathbb{R}^d$$. Let $$f=\left(f_1, \ldots, f_d\right):[0,1] \rightarrow \mathbb{R}^d$$ be continuous, such that $$f\left(\frac{i-1}{n-1}\right)=x_i$$, and linear on the intervals in between. Then the signature of $$\mathbf{x}$$ is defined as the collection of iterated integrals $\operatorname{Sig}(\mathbf{x})=\left(\left(\int_{0<t_1<\cdots<t_k<1} \cdots \prod_{j=1}^k \frac{\mathrm{d} f_{i_j}}{\mathrm{~d} t}\left(t_j\right) \mathrm{d} t_1 \cdots \mathrm{d} t_k\right)_{1 \leq i_1, \ldots, i_k \leq d}\right)_{k \geq 0}$

…In short, the signature of a path determines the path essentially uniquely, and does so in an efficient, computable way. Furthermore, the signature is rich enough that every continuous function of the path may be approximated arbitrarily well by a linear function of its signature; it may be thought of as a ‘universal nonlinearity’. Taken together these properties make the signature an attractive tool for machine learning. The most simple way to use the signature is as feature transformation, as it may often be simpler to learn a function of the signature than of the original path.

This makes it sound like we have a connection to koopman operators?

## 5 Code

The signature of a stream of data is essentially a collection of statistics about that stream of data. This collection of statistics does such a good job of capturing the information about the stream of data that it actually determines the stream of data uniquely. (Up to something called ’tree-like equivalance’ anyway, which is really just a technicality. It’s an equivalence relation that matters about as much as two functions being equal almost everywhere. That is to say, not much at all.) The signature transform is a particularly attractive tool in machine learning because it is what we call a ’universal nonlinearity’: it is sufficiently rich that it captures every possible nonlinear function of the original stream of data. Any function of a stream is linear on its signature. Now for various reasons this is a mathematical idealisation not borne out in practice (which is why we put them in a neural network and don’t just use a simple linear model), but they still work very well!

## 6 References

Bonnier, Kidger, Arribas, et al. 2019. In Advances in Neural Information Processing Systems.
Cass, and Salvi. 2024.
Chevyrev, and Kormilitzin. 2016. arXiv:1603.03788 [Cs, Stat].
Friz, and Hairer. 2020. A Course on Rough Paths. Edited by Peter K. Friz and Martin Hairer. Universitext.
Hodgkinson, Roosta, and Mahoney. 2021. “Stochastic Continuous Normalizing Flows: Training SDEs as ODEs.” Uncertainty in Artificial Intelligence.
Kalsi, Lyons, and Arribas. 2020. SIAM Journal on Financial Mathematics.
Kelly. 2016. The Annals of Applied Probability.
Kelly, and Melbourne. 2014.
Kidger. 2022.
Levin, Lyons, and Ni. 2016.
Lyons, Terry. 1994. Mathematical Research Letters.
———. 2014. arXiv:1405.4537 [Math, q-Fin, Stat].
Lyons, Terry J., and Sidorova. 2005. In Proceedings of the 4th International Symposium on Information and Communication Technologies. WISICT ’05.
Morrill, Salvi, Kidger, et al. 2021. In Proceedings of the 38th International Conference on Machine Learning.
Salvi, Cass, Foster, et al. 2021. SIAM Journal on Mathematics of Data Science.
Salvi, Lemercier, Liu, et al. 2024. In Advances in Neural Information Processing Systems. NIPS ’21.
Twardowska. 1996. “Wong-Zakai Approximations for Stochastic Differential Equations.” Acta Applicandae Mathematica.