At first I found it hard to visualise infinitesimal generators but perhaps this simple diagram will help
\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bf}[1]{\mathbf{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}{\mathbb{I}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
This note exists because no one explained to me satisfactorily to me why I should care about infinitesimal generators. These mysterious creatures pop up in the study of certain continuous time Markov processes, such as stochastic differential equations driven by Lévy noise. To learn: connection to Koopman operators.
Infinitesimal generators have a simple natural interpretation in terms of evolution of the laws of Markov chains. TBD: going from infinitesimal generator to stochastic Taylor expansion. A popular treatment of these objects is Revuz and Yor (2004); but there are treatments in every stochastic calculus book AFAICS. Reading about them is complicated by the fact that many sources assume these apply only to Wiener/Itô processes or finite-state continuous time Markov chains. Also those two models are dominant tools in queueing theory and finance. But these are much more general Markov processes out there.
I would like something less abstract to prime my intuition. Aït-Sahalia, Hansen, and Scheinkman (2010) and Reiß (2007) suit that descriptions. I would, further, like a treatment that does not presume the Markov process in question is some kind of integral of a Brownian motion, which is not the most interesting case.
I found George Lowther to be clarifying:
[Feller Processes] are Markov processes whose transition function \(\{P_t\}_{t\ge 0}\) satisfies certain continuity conditions. […] it is often not possible to explicitly write out the transition function describing a Feller process. Instead, the infinitesimal generator is used. This approximately describes the transition kernel \(P_t\) for small times \(t\), and can be viewed as the derivative of \(P_t\) at time 0, \(A=dP_t/dt\vert_{t=0}\).
Let us skip some boilerplate about convergence for now.
[The operator \(A\)] is called the infinitesimal generator of the semigroup \({\{P_t\}_{t\ge 0}}\).
[This] can alternatively be written as
\[P_tf = f + tAf + \mathcal{o}(t)\]
[…] So, the generator \(A\) gives the first-order approximation to \({P_t}\) for small \(t\).
Restricted to \({\mathcal{D}_A}\), the operator \({P_t}\) is differentiable with derivative given by \({AP_t=P_tA}\).
That is, Feller processes are more general than Lévy processes and less general than the class of all continuous time Markov processes. We can gloss over that for the current exposition and just think “well-behaved Markov”.
An infinitesimal generator is a kind of linearization of the local Markov transition kernel for a Feller process, i.e. for a non pathological Markov process.
OK, so now what can I do with this?
Saz says:
For a Markov process \((X_t)_{t \geq 0}\) we define the generator \(A\) by
\[Af(x) := \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} = \lim_{t \downarrow 0} \frac{P_tf(x)-f(x)}{t}\] whenever the limit exists in \((C_{\infty},\|\cdot\|_{\infty})\). Here \(P_tf(x) := \mathbb{E}^xf(X_t)\) denotes the semigroup of \((X_t)_{t \geq 0}\).
(Here \(\mathbb{E}^xf(X_t)\) is the expectation given \(X_0=x.\))
By Taylor’s formula this means that
\[\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)\]
for small \(t \geq 0\). So, basically, the generator describes the movement of the process in an infinitesimal time interval. One can show that
\[\frac{d}{dt} P_t f(x) = A P_tf(x), \tag{1}\]
i.e. the generator is the time derivative of the mapping \(t \mapsto P_tf(x)=\mathbb{E}^x(f(X_t))\). Reading [this] as a (partial) differential equation we see that \(u(t,x) := P_t f(x)\) is a solution to the PDE
\[\frac{\partial}{\partial t} u(t,x) = Au(t,x) \qquad u(0,x)=f(x).\]
This is one important reason why generators are of interest. Another, more probabilistic, reason is that the process
\[M_t^f := f(X_t) - f(X_0)- \int_0^t Af(X_s) \, ds, \qquad t \geq 0 \tag{2}\]
is a martingale. This means that we can associate with \((X_t)_{t \geq 0}\) a whole bunch of martingales, and this martingale property comes in handy very often, for example whenever we deal with expectations of the form \(\mathbb{E}^x(f(X_t))\). This leads to Dynkin’s formula.
Generators are also connected with the martingale problem which in turn can be used to characterize (weak) solutions of stochastic differential equations. Furthermore, generators of stochastic processes are strongly related to Dirichlet forms and Carré du champ operators; it turns out that they are extremely helpful to carry over results from probability theory to analysis (and vice versa). One important application are heat-kernel estimates.
Then Saz specialises to Brownian motion to recover the lazy version:
Example: Brownian motion In the case of (one-dimensional) Brownian motion \((B_t)_{t \geq 0}\), we see that
\[\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)\]
for small \(t\). This formula can be motivated by Taylor’s formula: Indeed,
\[\mathbb{E}^x(f(B_t)) \approx \mathbb{E}^x \left[f(x)+f'(x)(B_t-x)+\frac{1}{2} f''(x)(B_t-x)^2 \right]= f(x)+0+\frac{t}{2} f''(x)\]
using that \(\mathbb{E}^x(B_t-x)=0\) and \(\mathbb{E}^x((B_t-x)^2)=t\).
From [this] we see that \(u(t,x) := \mathbb{E}^x(f(B_t))\) is the (unique) solution of the heat equation
\[\partial_t u(t,x) = \frac{1}{2}\partial_x^2 u(t,x) \qquad u(0,x)=f(x).\]
Moreover, one can show that the solution of the Dirichlet problem is also related to the Brownian motion. Furthermore, […]
\[M_t^f := f(B_t)-f(B_0) - \frac{1}{2} \int_0^t f''(B_s) \, ds.\]
is a martingale. Having Itô’s formula in mind, this is not surprising since
\[\begin{aligned}f(B_t)-f(B_0) &= \int_0^t f'(B_s) \, dB_s+ \frac{1}{2} \int_0^t f''(B_s) \,ds\\&= M_t^f + \frac{1}{2} \int_0^t f''(B_s) \,ds.\end{aligned}\]
OK, but how does this work for other processes? What if our driving noise is, say, a Gamma process, \(G_t\sim\operatorname{Gamma}(t;\alpha, \lambda)\)? In that case we would find, using the Gamma process moments formula, and suppressing questions of convergence for a moment,
\[\begin{aligned} &\mathbb{E}^x[f(G_t)]\\ &= \mathbb{E}^x [f(x)+f'(x)(G_t-x)+{\textstyle\frac{1}{2}} f''(x)(G_t-x)^2+\\ &\qquad{\textstyle \frac{1}{6}} f'''(x)(G_t-x)^3 +\dots]\\ &= f(x)+\mathbb{E}^x [f'(x)(G_t-x)]+{\textstyle \frac{1}{2}}\mathbb{E}^x [ f''(x)(G_t-x)^2] + \\ &\qquad{\textstyle \frac{1}{6}}\mathbb{E}^x [ f'''(x)(G_t-x)^3]+\dots\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\langle \alpha t \rangle_{n}}{n!\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\Gamma(\alpha t +n)}{n\Gamma(\alpha t)\Gamma(n)\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{1}{\mathrm{B}(\alpha t,n)} \frac{1}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\alpha t + \alpha^2(\gamma + \psi^{0}(n)) t^2 + \dots}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x) +t\sum_{n=1}^{\infty} \frac{\alpha }{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x) +t^2\sum_{n=1}^{\infty} \frac{\alpha^2(\gamma + \psi^{0}(n))}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)+ \dots\\ \end{aligned}\]
Here \(\langle \alpha \rangle_{n}:=\frac{\Gamma(\alpha+n)}{\Gamma(\alpha)}\) is the rising factorial. I have translated that into the beta function \(\mathrm{B}\) which does not actually help us but it is soothingly suggestive of helpfulness. Then I take a further Taylor series for \(1/\mathrm{B}\), (lazily, using Mathematica) which introduces the digamma function \(\psi^{(0)}\). This I re-arrange to collect the coefficients of \(t^k\). The outcome is… not nearly so nice as in the Gaussian case, even if it were to converge properly, which is far from clear. Have I mis-stepped?
In any case, cranking the handle further we see that
\[\begin{aligned} Af(x) &:= \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(G_t))-f(x)}{t} \\ &=\sum_{n=1}^{\infty} \frac{\alpha }{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x) \end{aligned}\]
So our expression involves all derivatives of \(f\) and if \(\lambda<1\) the higher derivatives are highly weighted. Hmmm.
Note that we have used functional moment calculations and independence of increments, so we can assume that similar methods will work with more general Lévy processes. 🏗
No comments yet. Why not leave one?