Wiener-Khintchine representation

Now with bonus Bochner!

\[ \renewcommand{\lt}{<} \renewcommand{\gt}{>} \renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}[1]{\mathbb{I}\{#1\}} \renewcommand{\inner}[2]{\langle #1,#2\rangle} \renewcommand{\Inner}[2]{\left\langle #1,#2\right\rangle} \renewcommand{\finner}[3]{\langle #1,#2;#3\rangle} \renewcommand{\FInner}[3]{\left\langle #1,#2;#3\right\rangle} \renewcommand{\dinner}[2]{[ #1,#2]} \renewcommand{\DInner}[2]{\left[ #1,#2\right]} \renewcommand{\norm}[1]{\| #1\|} \renewcommand{\Norm}[1]{\left\| #1\right\|} \renewcommand{\fnorm}[2]{\| #1;#2\|} \renewcommand{\FNorm}[2]{\left\| #1;#2\right\|} \renewcommand{\trn}[1]{\mathcal{#1}} \renewcommand{\ftrn}[2]{\mathcal{#1}_{#2}} \renewcommand{\Ftrn}[3]{\mathcal{#1}_{#2}\left\{\right\}} \renewcommand{\argmax}{\mathop{\mathrm{argmax}}} \renewcommand{\argmin}{\mathop{\mathrm{argmin}}} \renewcommand{\omp}{\mathop{\mathrm{OMP}}} \]

TODO: stop faffing about with this “if the density exists” setup; Use Schwartz distributions, or apply transforms to measures directly, rather than densities, and say so at the top.

Consider a real-valued stochastic process \(\{X_t\}_{t\in\mathcal{T}}\) over an index metric space \(\mathcal{T}\) such as \(\mathcal{T}=\mathbb{R}^n\). i.e. any given realisation of such process is a function \(\mathcal{T}\to\mathbb{R}\). We will analyse it through the finite dimensional (cumulative) distributions

\[ F_{t_{1}, \ldots, t_{k}}\left(x_{1}, \ldots, x_{k}\right) =\mathbb{P}\left\{X_{t_{1}} \leq x_{1}, \ldots, X_{t_{k}} \leq x_{k}\right\} \] for any finite set \(\mathbf{t}:=\{t_1,\dots,t_k\}\subset \mathcal{T}.\)

\(F_{\mathbf{t}}\) gives us a measure \(:\mathcal{B}(\mathbb{R}^k)\to [0,1]\).

For \(t,s\in\mathcal{T}\), the process has expectation function

\[ m(t)=\int_{\mathbb{R}^{1}} x \mathrm{d} F_{t}(x) \]

and covariance

\[ \begin{aligned} C(t, s) &=\operatorname{Cov}\left\{X_{t}, X_{s}\right\}\\ &=\mathrm{E}\left\{X_{t} X_{s}\right\}-m(t) m(s) \\ &=\iint_{\mathbb{R}^{2}} x y \mathrm{d}^{2} F_{t, s}(x, y)-m(t) m(s) \end{aligned} \]

This is the creature that the Wiener-Khintchine theorem tells us about.

Wikipedia tells me:

In applied mathematics, the Wiener—Khinchin theorem, also known as the Wiener—Khintchine theorem and sometimes as the Wiener—Khinchin—Einstein theorem or the Khinchin—Kolmogorov theorem, states that the autocorrelation function of a wide-sense-stationary random process has a spectral decomposition given by the power spectrum of that process.

The Wikipedia introduction is unusually terrible though. I recommend the one in, e.g., (Abrahamsen 1997), which does much exploration of different types of continuity and relation to stochastic integrals and stuff.

Here wide sense stationary, a.k.a. weakly stationary or sometimes homogeneous, requires that

  1. the process mean function is constant

    \[ m(t)=m \] and

  2. correlation depends only on \(t-s\),

    \[ C(t, s)=C(t-s). \]

That is, the first two moments of the process are stationary, but other moments might be doing something weird. For the wildly popular case of Gaussian processes, these end up being the same.

What does this mean? Why do I care? Turns out this is very useful for many reasons. It relates the power spectral density to the correlation function. It gives me a means for analysing signals which are not, notionally, integrable.

Deterministic case

As seen in correlograms.

Consider an \(L_2\) signal \(f: \bb{R}\to\bb{R}.\) We will overload notation and refer to a signal with implied free argument, say, \(t\), so that \(f(rt-\xi),\) would, for example, refers to denote the signal \(t\mapsto f(rt-\xi).\) We write the inner product between signals \(t\mapsto f(t)\) and \(t\mapsto f'(t)\) as \(\inner{f(t)}{f'(t)}\). Where it is not clear that the free argument is, e.g. \(t\), we will annotate it \(\finner{f(t)}{f'(t)}{t}\). Say that \(\ftrn{F}{t}\{f(t)\}\) is the Fourier transform of some \(f(t)\in L_2\), i.e.

\[\begin{aligned} \ftrn{F}{t}\{f(t)\}(\tau)&=\int e^{2\pi i t \tau}f(t)\dd t\\ &=\finner{e^{2\pi i t \tau}}{f(t)}{t} \end{aligned}\]

and that \(\ftrn{A}{\xi}\{f(t)\}\) is the autocorrelogram, i.e.


What is the Fourier transform of \(\trn{A}f\)? That is what the Wiener-Khintchine-relation (Wiener 1930) tells us. Assuming all terms are well-defined (which is non-trivial in general!),


To see this, assuming various terms are indeed well-defined, we use the list of properties of the Fourier transform from Wikipedia and grind out the identity…

\[\begin{aligned} \ftrn{F}{\xi}\{\trn{A}f(\xi)\}(\tau) &=\ftrn{F}{\xi}\{\finner{f(t)}{f(t-\xi)}{t}\}(\tau) & \\ &=\int e^{2\pi i \xi \tau} \finner{f(t)}{f(t-\xi)}{t}\dd \xi & \\ &=\int e^{2\pi i \xi \tau} \int f(t)f(t-\xi)\dd t \dd \xi & \\ &=\int \int f(t) e^{2\pi i \xi \tau}f(t-\xi)\dd t \dd \xi & \\ &=\int \int f(t) e^{2\pi i \xi \tau}f(t-\xi)\dd \xi\dd t & \\ &=\int f(t) \int e^{2\pi i \xi \tau}f(t-\xi)\dd \xi\dd t & \\ &=\int f(t) \ftrn{F}{\xi}\{f(t-\xi)\}(\tau)\dd t & \\ &=\int f(t) e^{2\pi i t \tau} \ftrn{F}{\xi}\{f(-\xi)\}(\tau)\dd t & \\ &=\int f(t) e^{2\pi i t \tau} \ftrn{F}{\xi}\{f(\xi)\}(\tau)\dd t & \\ &=\int f(t) e^{2\pi i t \tau} \overline{\ftrn{F}{\xi}\{f(\xi)\}(\tau)}\dd t & f \text{ is real} \\ &=\int f(t) e^{2\pi i t \tau} \dd t \overline{\ftrn{F}{\xi}\{f(\xi)\}(\tau)} & \\ &=\ftrn{F}{t}\{f(t)\}(\tau) \overline{\ftrn{F}{\xi}\{f(\xi)\}(\tau)} & \\ &=\ftrn{F}{t}\{f(t)\}(\tau) \overline{\ftrn{F}{t}\{f(t)\}(\tau)} & \\ &=|\ftrn{F}{t}\{f(t)\}(\tau)|^2 & \end{aligned}\]

Corollary: If we are interested in the power spectrum of the autocorrelogram, we note that it relates non-linearly to that of the source signal.


Bochner’s Theorem

Compare and contrast Bochner’s theorem (1959), which is more general. (Everyone seems to like the exposition in (Yaglom 1987), which I have not yet read). It tells us that a complex-valued function \(C\) on \(\mathbb{R}^{d}\) is the covariance function of a weakly stationary mean square continuous complex-valued random process on \(\mathbb{R}^{d}\) if and only if it can be represented as

\[ C(\boldsymbol{\xi}) =\int_{\mathbb{R}^{P}} e^{2 \pi i \boldsymbol{t}^{\top} \boldsymbol{\xi}} \psi(\mathrm{d} \boldsymbol{t}) \] where \(\psi\) is a positive and finite measure. If \(\psi\) has a density \(s(\boldsymbol{t})\), then \(s\) is called the spectral density or power spectrum of \(C\). \(s\) and \(C\) are Fourier duals.

In the terminology of transforms from earlier, this means: \(C\) is a covariance kernel if it can be written

\[ C(\boldsymbol{\xi})=\ftrn{F}{\boldsymbol{t}}{s(\boldsymbol{t})} \]

for some (real, positive, finite) spectral density \(s\).

Or, to turn that around, we have a factory that constructs positive definite functions over \(\mathbb{R}^n\) from their bounded, non-negative spectral measures.

So, what is the spectral density of a random process with a covariance kernel \(C\) of given spectral density \(s\)?

To be continued. 🏗

Abrahamsen, Petter. 1997. “A Review of Gaussian Random Fields and Correlation Functions.”

Bochner, Salomon. 1959. Lectures on Fourier Integrals. Princeton University Press.

Broersen, Petrus MT. 2006. Automatic Autocorrelation and Spectral Analysis. Secaucus, NJ, USA: Springer.

Hartikainen, J., and S. Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.

Khintchine, A. 1934. “Korrelationstheorie der stationären stochastischen Prozesse.” Mathematische Annalen 109 (1, 1): 604–15.

Marple, S. Lawrence, Jr. 1987. Digital Spectral Analysis with Applications.

Priestley, M. B. 2004. Spectral Analysis and Time Series. Repr. Probability and Mathematical Statistics. London: Elsevier.

Rust, Henning. 2007. “Spectral Analysis of Stochastic Processes.” Lecture Notes for the E2C2/CIACS Summer School, Comorova, Romania, University of Potsdam, 1–76.

Särkkä, S., and J. Hartikainen. 2013. “Non-Linear Noise Adaptive Kalman Filtering via Variational Bayes.” In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 1–6.

Särkkä, Simo, and Jouni Hartikainen. 2012. “Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In Artificial Intelligence and Statistics.

Stoica, Petre, and Randolph L. Moses. 2005. Spectral Analysis of Signals. 1 edition. Upper Saddle River, N.J: Prentice Hall.

Wiener, Norbert. 1930. “Generalized Harmonic Analysis.” Acta Mathematica 55: 117–258.

Yaglom, A. M. 1987. Correlation Theory of Stationary and Related Random Functions. Volume II: Supplementary Notes and References. Springer Series in Statistics. New York, NY: Springer Science & Business Media.