Infinitesimal generators

Generators of the transition semi-group, connection to Kolmogorov forward equations


At first I found it hard to visualise infinitesimal generators but perhaps this simple diagram will help

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\bf}[1]{\mathbf{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}{\mathbb{I}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

This note exists because no one explained to me satisfactorily to me why I should care about infinitesimal generators. These mysterious creatures pop up in the study of certain continuous time Markov processes, such as stochastic differential equations driven by Lévy noise.

The justification I was given was something about PDEs which did not seem notably relevant. However, they have a simple natural interpretation in terms of evolution of the laws of Markov chains. A popular treatment of these objects is Revuz and Yor (2004); but there are treatments in every stochastic calculus book AFAICS. Reading about them is complicated by the fact that many sources assume these apply only to Wiener/Itō processes or finite-state continuous time Markov chains. I think this is because those are what Kolmogorov studied? Also those two models are dominant tools in queueing theory and finance. But these are much more general Markov processes out there.

I would like something less abstract to prime my intuition. Aït-Sahalia, Hansen, and Scheinkman (2010) and Reiß (2007) suit that descriptions. I would, further, like a treatment that does not presume the Markov process in question is some kind of integral of a Brownian motion, which is not the most interesting case.

I found George Lowther to be clarifying:

[Feller Processes] are Markov processes whose transition function \(\{P_t\}_{t\ge 0}\) satisfies certain continuity conditions. […] it is often not possible to explicitly write out the transition function describing a Feller process. Instead, the infinitesimal generator is used. This approximately describes the transition kernel \(P_t\) for small times \(t\), and can be viewed as the derivative of \(P_t\) at time 0, \(A=dP_t/dt\vert_{t=0}\).

Let us skip some boilerplate about convergence for now.

[The operator \(A\)] is called the infinitesimal generator of the semigroup \({\{P_t\}_{t\ge 0}}\).

[This] can alternatively be written as

\[P_tf = f + tAf + \mathcal{o}(t)\]

[…] So, the generator \(A\) gives the first-order approximation to \({P_t}\) for small \(t\).

Restricted to \({\mathcal{D}_A}\), the operator \({P_t}\) is differentiable with derivative given by \({AP_t=P_tA}\).

That is, Feller processes are more general than Lévy processes and less general than the class of all continuous time Markov processes. We can gloss over that for the current exposition and just think “well-behaved Markov”.

An infinitesimal generator is a kind of linearization of the local Markov transition kernel for a Feller process, i.e. for a non pathological Markov process.

OK, so now what can I do with this?

Saz says:

For a Markov process \((X_t)_{t \geq 0}\) we define the generator \(A\) by

\[Af(x) := \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} = \lim_{t \downarrow 0} \frac{P_tf(x)-f(x)}{t}\] whenever the limit exists in \((C_{\infty},\|\cdot\|_{\infty})\). Here \(P_tf(x) := \mathbb{E}^xf(X_t)\) denotes the semigroup of \((X_t)_{t \geq 0}\).

(Here \(\mathbb{E}^xf(X_t)\) is the expectation given \(X_0=x.\))

By Taylor’s formula this means that

\[\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)\]

for small \(t \geq 0\). So, basically, the generator describes the movement of the process in an infinitesimal time interval. One can show that

\[\frac{d}{dt} P_t f(x) = A P_tf(x), \tag{1}\]

i.e. the generator is the time derivative of the mapping \(t \mapsto P_tf(x)=\mathbb{E}^x(f(X_t))\). Reading [this] as a (partial) differential equation we see that \(u(t,x) := P_t f(x)\) is a solution to the PDE

\[\frac{\partial}{\partial t} u(t,x) = Au(t,x) \qquad u(0,x)=f(x).\]

This is one important reason why generators are of interest. Another, more probabilistic, reason is that the process

\[M_t^f := f(X_t) - f(X_0)- \int_0^t Af(X_s) \, ds, \qquad t \geq 0 \tag{2}\]

is a martingale. This means that we can associate with \((X_t)_{t \geq 0}\) a whole bunch of martingales, and this martingale property comes in handy very often, for example whenever we deal with expectations of the form \(\mathbb{E}^x(f(X_t))\). This leads to Dynkin’s formula.

Generators are also connected with the martingale problem which in turn can be used to characterize (weak) solutions of stochastic differential equations. Furthermore, generators of stochastic processes are strongly related to Dirichlet forms and Carré du champ operators; it turns out that they are extremely helpful to carry over results from probability theory to analysis (and vice versa). One important application are heat-kernel estimates.

Then Saz specialises to Brownian motion to recover the lazy version:

Example: Brownian motion In the case of (one-dimensional) Brownian motion \((B_t)_{t \geq 0}\), we see that

\[\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)\]

for small \(t\). This formula can be motivated by Taylor’s formula: Indeed,

\[\mathbb{E}^x(f(B_t)) \approx \mathbb{E}^x \left[f(x)+f'(x)(B_t-x)+\frac{1}{2} f''(x)(B_t-x)^2 \right]= f(x)+0+\frac{t}{2} f''(x)\]

using that \(\mathbb{E}^x(B_t-x)=0\) and \(\mathbb{E}^x((B_t-x)^2)=t\).

From [this] we see that \(u(t,x) := \mathbb{E}^x(f(B_t))\) is the (unique) solution of the heat equation

\[\partial_t u(t,x) = \frac{1}{2}\partial_x^2 u(t,x) \qquad u(0,x)=f(x).\]

Moreover, one can show that the solution of the Dirichlet problem is also related to the Brownian motion. Furthermore, […]

\[M_t^f := f(B_t)-f(B_0) - \frac{1}{2} \int_0^t f''(B_s) \, ds.\]

is a martingale. Having Itō’s formula in mind, this is not surprising since

\[\begin{aligned}f(B_t)-f(B_0) &= \int_0^t f'(B_s) \, dB_s+ \frac{1}{2} \int_0^t f''(B_s) \,ds\\&= M_t^f + \frac{1}{2} \int_0^t f''(B_s) \,ds.\end{aligned}\]

OK, but how does this work for other processes? What if our driving noise is, say, a gamma process, \(G_t\sim\operatorname{Gamma}(t;\alpha, \lambda)\)? In that case we would find, using the gamma process moments formula, and suppressing questions of convergence for a moment,

\[\begin{aligned} &\mathbb{E}^x[f(G_t)]\\ &= \mathbb{E}^x [f(x)+f'(x)(G_t-x)+{\textstyle\frac{1}{2}} f''(x)(G_t-x)^2+\\ &\qquad{\textstyle \frac{1}{6}} f'''(x)(G_t-x)^3 +\dots]\\ &= f(x)+\mathbb{E}^x [f'(x)(G_t-x)]+{\textstyle \frac{1}{2}}\mathbb{E}^x [ f''(x)(G_t-x)^2] + \\ &\qquad{\textstyle \frac{1}{6}}\mathbb{E}^x [ f'''(x)(G_t-x)^3]+\dots\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\langle \alpha t \rangle_{n}}{n!\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\Gamma(\alpha t +n)}{n\Gamma(\alpha t)\Gamma(n)\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{1}{\mathrm{B}(\alpha t,n)} \frac{1}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x)+\sum_{n=1}^{\infty} \frac{\alpha t + \alpha^2(\gamma + \psi^{0}(n)) t^2 + \dots}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)\\ &= f(x) +t\sum_{n=1}^{\infty} \frac{\alpha }{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x) +t^2\sum_{n=1}^{\infty} \frac{\alpha^2(\gamma + \psi^{0}(n))}{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x)+ \dots\\ \end{aligned}\]

Here \(\langle \alpha \rangle_{n}:=\frac{\Gamma(\alpha+n)}{\Gamma(\alpha)}\) is the rising factorial. I have translated that into the beta function \(\mathrm{B}\) which does not actually help us but it is soothingly suggestive of helpfulness. Then I take a further Taylor series for \(1/\mathrm{B}\), (lazily, using Mathematica) which introduces the digamma function \(\psi^{(0)}\). This I re-arrange to collect the coefficients of \(t^k\). The outcome is… not nearly so nice as in the Gaussian case, even if it were to converge properly, which is far from clear. Have I misstepped here?

In any case, cranking the handle further we see that

\[\begin{aligned} Af(x) &:= \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(G_t))-f(x)}{t} \\ &=\sum_{n=1}^{\infty} \frac{\alpha }{n\lambda^n}\frac{\dd^n}{\dd x^n}f(x) \end{aligned}\]

So our expression involves all derivatives of \(f\) and if \(\lambda<1\) the higher derivatives are highly weighted. Hmmm.

Note that we have used functional moment calculations and independence of increments, so we can assume that similar methods will work with more general Lévy processes. 🏗

Aalen, Odd O., Ørnulf Borgan, and S. Gjessing. 2008. Survival and Event History Analysis: A Process Point of View. Statistics for Biology and Health. New York, NY: Springer.

Aït-Sahalia, Yacine, Lars Peter Hansen, and José A. Scheinkman. 2010. “Operator Methods for Continuous-Time Markov Processes.” In Handbook of Financial Econometrics: Tools and Techniques, 1–66. Elsevier. https://doi.org/10.1016/B978-0-444-50897-3.50004-3.

Burridge, James, Mateusz Kwaśnicki, Alexey Kuznetsov, and Andreas Kyprianou. 2014. “New Families of Subordinators with Explicit Transition Probability Semigroup,” July. http://arxiv.org/abs/1402.1062.

Jacob, Niels, and René L. Schilling. 2001. “Lévy-Type Processes and Pseudodifferential Operators.” In Lévy Processes: Theory and Applications, edited by Ole E. Barndorff-Nielsen, Sidney I. Resnick, and Thomas Mikosch, 139–68. Boston, MA: Birkhäuser. https://doi.org/10.1007/978-1-4612-0197-7_7.

Keller-Ressel, Martin. 2006. “An Intuitive Introduction to Operator Semi-Groups,” 9. https://www.math.tu-dresden.de/~mkeller/docs/op_semigroup.pdf.

Kolesnikov, Leonid. 2019. “Lecture 6: Feller-Dynkin Processes.” In. http://www.mathematik.uni-muenchen.de/~kolesni/stochproc-feb07.pdf.

Papapantoleon, Antonis, and Maria Siopacha. 2010. “Strong Taylor Approximation of Stochastic Differential Equations and Application to the L\’evy LIBOR Model,” October. http://arxiv.org/abs/0906.5581.

Pellegrini, Clément. 2010. “Markov Chains Approximation of Jump–Diffusion Stochastic Master Equations.” Annales de L’Institut Henri Poincaré, Probabilités et Statistiques 46 (4): 924–48. https://doi.org/10.1214/09-AIHP330.

Reiß, Markus. 2007. “Stochastic Differential Equations,” 56. https://www.math.uni-heidelberg.de/studinfo/reiss/sode-lecture.pdf.

Revuz, Daniel, and Marc Yor. 2004. Continuous Martingales and Brownian Motion. Springer Science & Business Media. http://books.google.com?id=1ml95FLM5koC.

Royer, Manuela. 2004. “BSDEs with a Random Terminal Time Driven by a Monotone Generator and Their Links with PDEs.” Stochastics and Stochastic Reports 76 (4): 281–307. https://doi.org/10.1080/10451120410001696270.

Scheutzow, Michael. n.d. “Stochastic Processes II/ Wahrscheinlichkeitstheorie III,” 79.

Schilling, René L. 2016. “An Introduction to Lévy and Feller Processes.” Advanced Courses in Mathematics - CRM Barcelona 2014, October. http://arxiv.org/abs/1603.00251.

Shahshahani, Mehrdad Mirshams. n.d. “Chapter3: Continuous Time Processes.” In Statistics 217/218: Introduction to Stochastic Processes. Accessed January 30, 2020. https://web.stanford.edu/class/stat217/Chapter3.pdf.