Wiener-Khintchine representation

Now with bonus Bochner!

\[ \renewcommand{\lt}{<} \renewcommand{\gt}{>} \renewcommand{\var}{\operatorname{Var}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \]

Consider a real-valued stochastic process \(\{X_{\vv{t}}\}_{\vv{t}\in\mathcal{T}}\) over an index (metric) space \(\mathcal{T}\), i.e. a realisation of such process is a function \(\mathcal{T}\to\mathbb{R}\). For the sake of concreteness we will take \(\mathcal{T}=\mathbb{R}^{d}\) here.

Suppose the process has a probability measure \(F_{\vv{t}}\) such that for \(\vv{t},\vv{s}\in\mathcal{T}\), the process has expectation function

\[ \mu(\vv{t})=\Ex[X_{\vv{t}}]=\int_{\mathbb{R}} x F_{\vv{t}}(\dd x) \]

and covariance

\[ \begin{aligned} K(\vv{t}, \vv{s}) &=\operatorname{Cov}\left\{X_{\vv{t}}, X_{\vv{s}}\right\}\\ &=\Ex[X_{\vv{t}} X_{\vv{s}}]-\mu(\vv{t}) \mu(\vv{s}) \\ &=\iint_{\mathbb{R}^{2}} x y F_{\vv{t}, \vv{s}}(\dd x \times \dd y)-\mu(\vv{t}) \mu(\vv{s}) \end{aligned} \]

We are concerned with ways to represent this covariance function \(K\) via spectral theorems.

Wiener theorem: Deterministic case

This is also interesting and I wrote it up for a different project: See Wiener theorem.

Wiener-Khinchine theorem: Spectral density of covariance kernels

The Wikipedia introduction is unusually terrible. I recommend the one in, e.g., (Abrahamsen 1997) instead.

Anyway, this theorem governs wide-sense-stationary random processes. Here wide-sense-stationary, a.k.a. weakly stationary or sometimes homogeneous, requires that

  1. the process mean function is constant, \(\mu(\vv{t})=\mu,\) and
  2. correlation depends only on \(\vv{t}-\vv{s}\), i.e. \(K(\vv{t}, \vv{s})=K(\vv{t}-\vv{s}).\)

That is, the first two moments of the process are stationary, but other moments might be doing something weird. For the wildly popular case of Gaussian processes, since the first two moments uniquely determine the process, these end up being the same.

In this context, the Wiener-Khintchine theorem tells us that there exists a finite positive measure \(\Psi\) such that

\[ K(\vv{\tau} )=\int \exp(2\pi i\vv{\omega}^{\top}\tau )\Psi(\dd \vv{\omega}), \]

If \(\Psi\) has a density \(\psi(\vv{\omega})\) with respect to the dominating Lebesgue measure, then

\[ \psi(\vv{\omega})=\int K(\vv{\tau} )\exp(-2\pi i \vv{\omega}^{\top} \vv{\tau} )\,\dd\vv{\tau} . \]

What does this mean? Why do I care? Turns out this is useful for many reasons. It relates the power spectral density to the correlation function. It gives me a means for analysing signals which are not, notionally, integrable.

Bochner’s Theorem: stationary spectral kernels

Everyone seems to like the exposition in Yaglom (1987), which I brusquely summarise here:

Bochner’s theorem tells us that \(K:\mathcal{T}\to\mathbb{R}\) is the covariance function of a weakly stationary mean square continuous complex-valued random process on \(\mathbb{R}^{d}\) if and only if it can be represented as

\[ K(\vv{\tau})=\int_{\mathcal{T}} \exp \left(2 \pi i \vv{\omega}^{\top} \vv{\tau}\right) \Psi(\mathrm{d} \vv{\omega}) \] where \(\Psi\) is a positive and finite measure on \(\mathcal{F}.\) If \(\Psi\) has a density \(\psi(\vv{\omega})\) with respect to the dominating Lebesgue measure, then \(\psi\) is called the spectral density or power spectrum of \(K,\) and \(\psi\) and \(K\) are Fourier duals.

This looks similar to the Wiener-Khintchine theorem, no? This one is telling us that the power spectrum represents all possible stationary kernels, i.e. we are not missing out on any by using a spectral representation.

Yaglom’s theorem

Some of the kernel design literature (Sun et al. 2018; Remes, Heinonen, and Kaski 2017; Kom Samo and Roberts 2015) cites a generalised Bochner Theorem (Yaglom 1987) often called Yaglom’s Theorem, which does not presume stationarity:

A complex-valued bounded continuous function \(K\) on \(\mathbb{R}^{d}\) is the covariance function of a mean square continuous complex-valued random process on \(\mathbb{R}^{d}\) if and only if it can be represented as

\[ K(\vv{s}, \vv{t})=\int_{\mathcal{T} \times\mathcal{T}} e^{2 \pi i\left(\vv{\omega}_{1}^{\top} \vv{s}-\vv{\omega}_{2}^{\top} \vv{t}\right)} \Psi\left(\dd \vv{\omega}_{1}\times \dd \vv{\omega}_{2}\right) \]

This is reassuring, but does not constrain kernel designs in an obviously useful way.


Abrahamsen, Petter. 1997. “A Review of Gaussian Random Fields and Correlation Functions.”
Bochner, Salomon. 1959. Lectures on Fourier Integrals. Princeton University Press.
Broersen, Petrus MT. 2006. Automatic Autocorrelation and Spectral Analysis. Secaucus, NJ, USA: Springer.
Hartikainen, J., and S. Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.
Khintchine, A. 1934. “Korrelationstheorie der stationären stochastischen Prozesse.” Mathematische Annalen 109 (1, 1): 604–15.
Kom Samo, Yves-Laurent, and Stephen Roberts. 2015. “Generalized Spectral Kernels.” June 7, 2015.
Marple, S. Lawrence, Jr. 1987. Digital Spectral Analysis with Applications.
Priestley, M. B. 2004. Spectral Analysis and Time Series. Repr. Probability and Mathematical Statistics. London: Elsevier.
Remes, Sami, Markus Heinonen, and Samuel Kaski. 2017. “Non-Stationary Spectral Kernels.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4642–51. Curran Associates, Inc.
Rust, Henning. 2007. “Spectral Analysis of Stochastic Processes.” Lecture Notes for the E2c2/CIACS Summer School, Comorova, Romania, University of Potsdam, 1–76.
Särkkä, S., and J. Hartikainen. 2013. “Non-Linear Noise Adaptive Kalman Filtering via Variational Bayes.” In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 1–6.
Särkkä, Simo, and Jouni Hartikainen. 2012. “Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In Artificial Intelligence and Statistics.
Stoica, Petre, and Randolph L. Moses. 2005. Spectral Analysis of Signals. 1 edition. Upper Saddle River, N.J: Prentice Hall. ps/SAS-new.pdf.
Sun, Shengyang, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, and Roger Grosse. 2018. “Differentiable Compositional Kernel Learning for Gaussian Processes.” 2018.
Wiener, Norbert. 1930. “Generalized Harmonic Analysis.” Acta Mathematica 55: 117–258.
Yaglom, A. M. 1987. Correlation Theory of Stationary and Related Random Functions. Volume II: Supplementary Notes and References. Springer Series in Statistics. New York, NY: Springer Science & Business Media.