# Wiener-Khintchine representations

## Spectral representations of stochastic processes


Consider a real-valued stochastic process $$\{\rv{f}_{\vv{t}}\}_{\vv{t}\in\mathcal{T}}$$ such that a realisation of such process is a function $$\mathcal{T}\to\mathbb{R}$$ where $$\mathcal{T}\subseteq \mathcal{R}^d$$ is some compact set of non-zero Lebesgue volume, like a hypercube, or all of $$\mathbb{R}^{d}$$.1 We call $$\mathcal{T}$$ the index.

Suppose the process is described by a probability measure $$\mu_{\vv{t}}, \vv{t}\in\mathcal{T}$$ such that for $$\vv{t},\vv{s}\in\mathcal{T}$$, the process has expectation function $\Ex[t]=\Ex[\rv{f}_{\vv{t}}]=\int_{\mathbb{R}} x \mu_{\vv{t}}(\dd x)$ and covariance \begin{aligned} K(\vv{t}, \vv{s}) &=\operatorname{Cov}\left\{\rv{f}_{\vv{t}}, \rv{f}_{\vv{s}}\right\}\\ &=\Ex[\rv{f}_{\vv{t}} \rv{f}_{\vv{s}}]-\Ex[\rv{f}_{\vv{t}}]\Ex[ \rv{f}_{\vv{s}}] \\ &=\iint_{\mathbb{R}^{2}} \vv{s}\vv{t} \mu_{\vv{t}, \vv{s}}(\dd \vv{s} \times \dd \vv{t})-\Ex[\rv{f}_{\vv{t}}]\Ex[ \rv{f}_{\vv{s}}] \end{aligned} We are concerned with ways to represent this covariance function $$K$$.

I do not want to arse about with this mean function overmuch since it only clutters things up, so hereafter we will assume $$\Ex[\vv{t}]=0$$ unless stated otherwise.

## Wiener theorem: Deterministic case

This is also interesting and I wrote it up for a different project: See Wiener theorem.

## Wiener-Khinchine theorem: Spectral density of covariance kernels

I found the Wikipedia introduction usually confusing. I recommend a well-written article, e.g. Abrahamsen (1997) or Robert J. Adler, Taylor, and Worsley (2016). Anyway, this theorem governs wide-sense-stationary random processes. Here wide-sense-stationary, a.k.a. weakly stationary or sometimes homogeneous, requires that

1. the process mean function is constant, $$\Ex[\vv{t}]=0,$$ w.l.o.g. and
2. correlation depends only on $$\vv{t}-\vv{s}$$, i.e. $$K(\vv{t}, \vv{s})=K(\vv{t}-\vv{s}).$$

That is, the first two moments of the process are stationary, but other moments might do something weird. For the wildly popular case of Gaussian processes, since the first two moments uniquely determine the process, these end up being the same. In this context, the Wiener-Khintchine theorem tells us that there exists a finite positive measure $$\nu$$ on the Borel subsets of $$\mathbb{R}^d$$ such that the covariance kernel is given $K(\vv{\tau} )=\int \exp(2\pi i\vv{\omega}^{\top}\tau )\nu(\dd \vv{\omega}).$

If $$\nu$$ has a density $$\psi(\vv{\omega})$$ with respect to the dominating Lebesgue measure, then $\psi(\vv{\omega})=\int K(\vv{\tau} )\exp(-2\pi i \vv{\omega}^{\top} \vv{\tau} )\,\dd\vv{\tau}.$ That is, the power spectral density and the covariance kernel are Fourier dual. Nifty.

What does this mean? Why do I care? Turns out this is useful for many reasons. It relates the power spectral density to the correlation function, and also to continuity/differentiability.

## Bochner’s Theorem: stationary spectral kernels

Everyone seems to like the exposition in Yaglom (1987), which I brusquely summarise here. Bochner’s theorem tells us that $$K:\mathcal{T}\to\mathbb{R}$$ is the covariance function of a weakly stationary, mean-square-continuous, complex-valued random process on $$\mathbb{R}^{d}$$ if and only if it can be represented as $K(\vv{\tau})=\int_{\mathcal{T}} \exp \left(2 \pi i \vv{\omega}^{\top} \vv{\tau}\right) \nu(\mathrm{d} \vv{\omega})$ where $$\nu$$ is a positive and finite measure on (the Borel subsets of) $$\mathbb{C}^d.$$ If $$\nu$$ has a density $$\psi(\vv{\omega})$$ with respect to the dominating Lebesgue measure, then $$\psi$$ is called the spectral density of $$K,$$ and $$\psi$$ and $$K$$ are Fourier duals. This is what Robert J. Adler, Taylor, and Worsley (2016) calls the spectral distribution theorem.

This looks similar to the Wiener-Khintchine theorem, no? This one is telling us that the power spectrum represents all possible stationary kernels, i.e. we are not missing out on any by using a spectral representation. Note also that we needed to generalize this to complex-valued fields, and consider integrals over complex indices for it to make sense; the real fields arise as a special case.

## Yaglom’s theorem

Some of the kernel design literature cites a generalised Bochner-type Theorem , Yaglom’s Theorem, which does not presume stationarity:

A complex-valued, bounded, continuous function $$K$$ on $$\mathbb{R}^{d}$$ is the covariance function of a mean-square-continuous, complex-valued, random process on $$\mathbb{R}^{d}$$ if and only if it can be represented as $K(\vv{s}, \vv{t})=\int_{\mathcal{T} \times\mathcal{T}} e^{2 \pi i\left(\vv{\omega}_{1}^{\top} \vv{s}-\vv{\omega}_{2}^{\top} \vv{t}\right)} \nu\left(\dd \vv{\omega}_{1}\times \dd \vv{\omega}_{2}\right).$

This is reassuring, but does not constrain kernel designs in an obviously useful way to my tiny monkey brain.

## Spectral representation

This is nearly simple, but has the minor complication that the most intuitive route (to my mind at least) requires us to admit complex stochastic processes.

TBD; mention power spectrum, spectral moments.

## References

Abrahamsen, Petter. 1997.
Adler, Robert J. 2010. The Geometry of Random Fields. SIAM ed. Philadelphia: Society for Industrial and Applied Mathematics.
Adler, Robert J., and Jonathan E. Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115. New York: Springer.
Adler, Robert J, Jonathan E Taylor, and Keith J Worsley. 2016. Applications of Random Fields and Geometry Draft.
Bochner, Salomon. 1959. Lectures on Fourier Integrals. Princeton University Press.
Broersen, Petrus MT. 2006. Automatic Autocorrelation and Spectral Analysis. Secaucus, NJ, USA: Springer.
Hartikainen, J., and S. Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE.
Higdon, Dave. 2002. In Quantitative Methods for Current Environmental Issues, edited by Clive W. Anderson, Vic Barnett, Philip C. Chatwin, and Abdel H. El-Shaarawi, 37–56. London: Springer.
Khintchine, A. 1934. Mathematische Annalen 109 (1): 604–15.
Kom Samo, Yves-Laurent, and Stephen Roberts. 2015. arXiv:1506.02236 [Stat], June.
Krapf, Diego, Enzo Marinari, Ralf Metzler, Gleb Oshanin, Xinran Xu, and Alessio Squarcini. 2018. New Journal of Physics 20 (2): 023029.
Loynes, R. M. 1968. Journal of the Royal Statistical Society. Series B (Methodological) 30 (1): 1–30.
Marple, S. Lawrence, Jr. 1987. Digital Spectral Analysis with Applications.
Priestley, M. B. 2004. Spectral analysis and time series. Repr. Probability and mathematical statistics. London: Elsevier.
Remes, Sami, Markus Heinonen, and Samuel Kaski. 2017. In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4642–51. Curran Associates, Inc.
Rust, Henning. 2007. Lecture Notes for the E2C2/CIACS Summer School, Comorova, Romania, University of Potsdam, 1–76.
Särkkä, S., and J. Hartikainen. 2013. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 1–6.
Särkkä, Simo, and Jouni Hartikainen. 2012. In Artificial Intelligence and Statistics.
Stoica, Petre, and Randolph L. Moses. 2005. Spectral Analysis of Signals. 1 edition. Upper Saddle River, N.J: Prentice Hall.
Sun, Shengyang, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li, and Roger Grosse. 2018. “Differentiable Compositional Kernel Learning for Gaussian Processes.” arXiv Preprint arXiv:1806.04326.
Wiener, Norbert. 1930. Acta Mathematica 55: 117–258.
Yaglom, A. M. 1987. Correlation Theory of Stationary and Related Random Functions. Volume II: Supplementary Notes and References. Springer Series in Statistics. New York, NY: Springer Science & Business Media.

1. We can take it to be a sub manifold but things get more subtle and complex.↩︎

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.