Infinitesimal generators
Generators of the transition semi-group, connection to Kolmogorov forward equations
May 29, 2017 — February 5, 2020
\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bf}[1]{\mathbf{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}{\mathbb{I}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
This note exists because no one explained to me satisfactorily why I should care about infinitesimal generators. These mysterious creatures pop up in the study of certain continuous-time Markov processes, such as stochastic differential equations driven by Lévy noise.
To learn: connection to Koopman operators.
Infinitesimal generators have a simple natural interpretation in terms of the evolution of the laws of Markov chains. TBD: going from infinitesimal generator to stochastic Taylor expansion. A popular treatment of these objects is Revuz and Yor (2004); but there are treatments in every stochastic calculus book AFAICS. Reading about them is complicated by the fact that many sources assume these apply only to Wiener/Itô processes or finite-state continuous-time Markov chains. Those two models are dominant tools in queueing theory and finance. But there are many more general Markov processes out there, and the ambiguous specificity of this concept is not helpful for those of us who operate in more general spaces.
I would like some specific but diverse examples to prime my intuition. I would, further, like a treatment that does not presume the Markov process in question is some kind of integral of a Brownian motion, which is not the most interesting case. Aït-Sahalia, Hansen, and Scheinkman (2010) and Reiß (2007) suit that description.
1 Getting started
First, we need to define Feller processes. I found George Lowther to be clarifying:
[Feller Processes] are Markov processes whose transition function \(\{P_t\}_{t\ge 0}\) satisfies certain continuity conditions. […] it is often not possible to explicitly write out the transition function describing a Feller process. Instead, the infinitesimal generator is used. This approximately describes the transition kernel \(P_t\) for small times \(t\), and can be viewed as the derivative of \(P_t\) at time 0, \(A=dP_t/dt\vert_{t=0}\).
Let us skip some boilerplate about convergence for now.
[The operator \(A\)] is called the infinitesimal generator of the semigroup \({\{P_t\}_{t\ge 0}}\).
[This] can alternatively be written as
\[P_tf = f + tAf + \mathcal{o}(t)\]
[…] So, the generator \(A\) gives the first-order approximation to \({P_t}\) for small \(t\).
Restricted to \({\mathcal{D}_A}\), the operator \({P_t}\) is differentiable with derivative given by \({AP_t=P_tA}\).
That is, Feller processes are more general than Lévy processes and less general than the class of all continuous-time Markov processes. We can gloss over that for the current exposition and just think “well-behaved Markov” or SDEs.
An infinitesimal generator is a kind of linearization of the local Markov transition kernel for a Feller process, i.e., for a non-pathological Markov process.
OK, so now what can we do with this? Well, we might observe that because Feller processes have a kind of stochastic smoothness, we can hope that linearizations of these processes behave nicely, and in particular (conditional) something like a “local” Taylor expansion might be possible and even useful. Saz says:
For a Markov process \((X_t)_{t \geq 0}\) we define the generator \(A\) by
\[Af(x) := \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} = \lim_{t \downarrow 0} \frac{P_tf(x)-f(x)}{t}\] whenever the limit exists in \((C_{\infty},\|\cdot\|_{\infty})\). Here \(P_tf(x) := \mathbb{E}^xf(X_t)\) denotes the semigroup of \((X_t)_{t \geq 0}\).
(Here \(\mathbb{E}^xf(X_t)\) is the expectation given \(X_0=x.\))
By Taylor’s formula this means that
\[\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)\]
for small \(t \geq 0\). So, basically, the generator describes the movement of the process in an infinitesimal time interval. One can show that
\[\frac{d}{dt} P_t f(x) = A P_tf(x), \tag{1}\]
i.e., the generator is the time derivative of the mapping \(t \mapsto P_tf(x)=\mathbb{E}^x(f(X_t))\). Reading [this] as a (partial) differential equation we see that \(u(t,x) := P_t f(x)\) is a solution to the PDE
\[\frac{\partial}{\partial t} u(t,x) = Au(t,x) \qquad u(0,x)=f(x).\]
This is one important reason why generators are of interest. Another, more probabilistic, reason is that the process
\[M_t^f := f(X_t) - f(X_0)- \int_0^t Af(X_s) \, ds, \qquad t \geq 0 \tag{2}\]
is a martingale. This means that we can associate with \((X_t)_{t \geq 0}\) a whole bunch of martingales, and this martingale property comes in handy very often, for example whenever we deal with expectations of the form \(\mathbb{E}^x(f(X_t))\). This leads to Dynkin’s formula.
Generators are also connected with the martingale problem which in turn can be used to characterize (weak) solutions of stochastic differential equations.
The connection to the stochastic Taylor expansions is pretty glaring at this point. 🚧TODO🚧 clarify.
Then Saz specialises to Brownian motion to recover the lazy version:
Example: Brownian motion In the case of (one-dimensional) Brownian motion \((B_t)_{t \geq 0}\), we see that
\[\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)\]
for small \(t\). This formula can be motivated by Taylor’s formula: Indeed,
\[\mathbb{E}^x(f(B_t)) \approx \mathbb{E}^x \left[f(x)+f'(x)(B_t-x)+\frac{1}{2} f''(x)(B_t-x)^2 \right]= f(x)+0+\frac{t}{2} f''(x)\]
using that \(\mathbb{E}^x(B_t-x)=0\) and \(\mathbb{E}^x((B_t-x)^2)=t\).
From [this] we see that \(u(t,x) := \mathbb{E}^x(f(B_t))\) is the (unique) solution of the heat equation
\[\partial_t u(t,x) = \frac{1}{2}\partial_x^2 u(t,x) \qquad u(0,x)=f(x).\]
Moreover, one can show that the solution of the Dirichlet problem is also related to the Brownian motion. Furthermore, […]
\[M_t^f := f(B_t)-f(B_0) - \frac{1}{2} \int_0^t f''(B_s) \, ds.\]
is a martingale. Having Itô’s formula in mind, this is not surprising since
\[\begin{aligned}f(B_t)-f(B_0) &= \int_0^t f'(B_s) \, dB_s+ \frac{1}{2} \int_0^t f''(B_s) \,ds\\&= M_t^f + \frac{1}{2} \int_0^t f''(B_s) \,ds.\end{aligned}\]
2 Example: Gamma process
How does this work for other processes? What if our driving noise is, say, a Gamma process, \(G_t\sim\operatorname{Gamma}(t;\alpha, \lambda)\)? In that case, we would find, using the Gamma process moments formula, and suppressing questions of convergence for a moment, that
\[\begin{aligned} &\mathbb{E}^x[f(G_t)]\\ &= \mathbb{E}^x [f(x)+f'(x)(G_t-x)+{\textstyle\frac{1}{2}} f''(x)(G_t-x)^2+\\ &\qquad{\textstyle\frac{1}{6}} f'''(x)(G_t-x)^3 +\dots]\\ &= f(x)+\mathbb{E}^x [f'(x)(G_t-x)]+{\textstyle\frac{1}{2}}\mathbb{E}^x [ f''(x)(G_t-x)^2] + \\ &\qquad{\textstyle \frac{1}{6}}\mathbb{E}^x [ f'''(x)(G_t-x)^3]+\dots\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\langle \alpha t \rangle_{n}}{n!\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\Gamma(\alpha t +n)}{n\Gamma(\alpha t)\Gamma(n)\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{1}{\mathrm{B}(\alpha t,n)} \frac{1}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\alpha t + \alpha^2(\gamma + \psi^{0}(n)) t^2 + \dots}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x) +t\sum_{n=1}^{\infty} \frac{\alpha }{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x) +t^2\sum_{n=1}^{\infty} \frac{\alpha^2(\gamma + \psi^{0}(n))}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)+ \dots\\ \end{aligned}\]
Here \(\langle \alpha \rangle_{n}:=\frac{\Gamma(\alpha+n)}{\Gamma(\alpha)}\) is the rising factorial. I have translated that into the beta function \(\mathrm{B}\) which does not actually help us but it is soothingly suggestive of helpfulness. Then I take a further Taylor series for \(1/\mathrm{B}\), (lazily, using Mathematica) which introduces the digamma function \(\psi^{(0)}\). This I re-arrange to collect the coefficients of \(t^k\). The outcome is… not nearly so nice as in the Gaussian case, even if it were to converge properly, which is far from clear. Have I misstepped?
3 Other stuff about generators
Cranking the handle further we see that
\[\begin{aligned} Af(x) &:= \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(G_t))-f(x)}{t} \\ &=\sum_{n=1}^{\infty} \frac{\alpha }{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x) \end{aligned}\]
So our expression involves all derivatives of \(f\) and if \(\lambda<1\) the higher derivatives are highly weighted. Hmmm.
Note that we have used functional moment calculations and independence of increments, so we can assume that similar methods will work with more general Lévy processes. 🏗
4 Connection to Stein’s method
Barbour (n.d.) uses the infinitesimal generator to derive a Stein equation for the Poisson distribution, and in fact for rather general families.