# Probabilistic spectral analysis

November 13, 2019 — November 25, 2020

Bayes
dynamical systems
linear algebra
probability
sciml
signal processing
state space models
statistics
stochastic processes
time series

Graphical introduction to nonstationary modelling of audio data. The input (bottom) is a sound recording of female speech. We seek to decompose the signal into Gaussian process carrier waveforms (blue block) multiplied by a spectrogram (green block). The spectrogram is learned from the data as a nonnegative matrix of weights times positive modulators (top).

I am interested in probabilistic analogues of time frequency analysis and, what is nearly the same thing, autocorrelation analysis, but for non-stationary signals. This is natural with Gaussian processes.

I am especially interested in this for audio signals, which can be very large, but have certain simplicities — i.e. being scalar functions of a univariate time index, usually regularly sampled.

In signal processing we frequently use Fourier transforms as a notionally nonparametric model for such a system, or a source of features for analysis.

That is classic stuff but it is (for me) always unsatisfying just taking the Fourier transform of something and hoping to have learned stuff about the system. There are a lot of arbitrary tuning parameters and awkward assumptions about, e.g. local stationarity and arbitrary ways of introducing non-local correlation. The same holds for the deterministic autocorrelogram, on which I have recently published a paper. I got good results, but I had no principled way to select the regularisation and interpretation of the methods. Unsatisfying.

I think we can do better by looking at the probabilistic behaviour of Fourier transforms and treating these as Bayesian nonparametric problems. This could solve a few problems at once.

This is an active area, with a several approaches.

## 1 Classic: stochastic processes studied via correlation function

I’ve discussed basic stationary signal analysis everywhere, but why not check out some backgrounders in ?

## 2 Non-stationary spectral kernel

The central tool here in practice is Bochner’s theorem, which states that the Fourier transform of some spectral measure is a valid covariance kernel:

$\kappa(\Delta t)=\mathcal{F}_{\Delta t}.$

Taking this insight and running with it you can do lots of fun stuff. Turner and Sahani (2014) is sometimes mentioned as ground-zero of this kind of research, although the connections are certainly much older, e.g. Curtain (1975). Wiener and Khintchine approaches were not far from this and it is implicit in Kalman-Bucy filtering . There are natural extensions of classic results, e.g. a Shannon-Nyquist theorem Tobar (2019). In modern times we have related but more specialised techniques such as the probabilistic phase vocoder . See also the connections to time series state models of Hartikainen and Särkkä (2010), Lindgren, Rue, and Lindström (2011), Reece and Roberts (2010) and Liutkus, Badeau, and Richard (2011).

There are nice introductions in some papers , which unite various pieces I was discussing above with actual applications. I will work through these methods here for my own edification.

🏗

The basic setting is the same as for typical audio signal analysis; we begin with a (random) signal $$f:\mathbb{R}\to\mathbb{R}$$, where the argument is a continuous time index. We do not know this signal, but will infer its properties will have some countable number of discrete observations, $$\mathbf{f}:=\{f(t_k);k=1,2,\dots,K\}.$$

We imagine observations from this signal are modelled by a Gaussian process, giving us the same setup as Gaussian process regression. We introduce the additional assumption here that the scalar index $$\mathcal{I}:=\mathbb{R}$$ represents time.

I suppose what we are doing here is requiring that there be some model for sampling error and that it may as well be the most convenient possible model to work with, which is additive Gaussian. More general noise models are indeed possible, and if we allow other Gaussian processes as additive noise models then we are on the way to constructing a source separation model. That is indeed what do.1

Anyway, with these choices, this becomes absolutely the classic Gaussian process regression with some specialisation. (univariate index, mean-0)

It is also not far from the classic time frequency spectral analysis setup, where we take Fourier transforms over fixed size windows to estimate a kind of deterministic approximation to $$\kappa$$ (thanks Wiener-Khintchine theorem); in that context we are effectively assuming that for each window we have an independent estimation problem, and a periodic kernel. I should make that relationship precise. 🏗

There is clearly a lot wrapped up in the kernel, $$\kappa(t, t';\mathbf{\theta}).$$

Typically this is some kind of spectral mixture kernel . W. J. Wilkinson et al. (2019a) summarizes these as:

\begin{aligned} \kappa_{\mathrm{sm}}\left(t, t^{\prime}\right) &=\sum_{d=1}^{D} \kappa_{z}^{(d)}\left(t, t^{\prime}\right) \\ \kappa_{z}^{(d)}\left(t, t^{\prime}\right) &=\sigma_{d}^{2} \cos \left(\omega_{d}\left(t-t^{\prime}\right)\right) \kappa_{d}\left(t, t^{\prime}\right) \end{aligned}

$$\kappa_{d}$$ is free to be chosen, but is typically from the Matérn class of kernel functions. Parameters $$\omega_{d}$$ determine the periodicity of the kernel components, which can be interpreted as the centre frequencies of the filters in a probabilistic filter bank. By choosing the exponential kernel $$\kappa_{d}\left(t, t^{\prime}\right)=\exp \left(\left|t-t^{\prime}\right| / \ell_{d}\right)$$ we recover exactly the probabilistic phase vocoder (Cemgil & Godsill, 2005 ), and the lengthscales $$\ell_{d}$$ control the filter bandwidths.

More generally we would like this to be a non-stationary kernel, which requires a model for the density of these kernels. W. J. Wilkinson et al. (2019a) uses a NMF model with a GP prior on some matrix rows and applying a softmax link. (Remes, Heinonen, and Kaski (2018) seems to get a similar structure?)

## 3 Locally stationary

Connection to the short time Fourier transform, where signals are assumed stationary. Change point detection version TBD.

There is an alternative approach which looks at switching between covariance kernels/spectrogram. One strand is the AdaptSpect family of methods , whgcih develop fast MCMC samplers by using Whittle likelihood approaches over randomised change points. Disclosure of bias: I just enjoyed a seminar by my colleague Michael Bertolacci on this theme, and Sally Cripps née Wood works 20m from me and was co-author these.

Russell Tsuchida has made me aware of a parallel body o -g mouse off f work which keeps the spectrogram implicit and changes the covariance kernel. This is still reasonably fast thanks to Lattice GP tricks.

TODO: compare and contrast these methods. ls -al /dev/disk/by-uuid/I suspect a major difference is that the former targets statisticians and the latter ML people but they can probably be combined, or at least a neat cherry-picked method leveraging both should be feasible.

## 4 Non-Gaussian approaches

For now, see sparse stochastic processes.

## 5 References

Adams, and MacKay. 2007. arXiv:0710.3742 [Stat].
Alvarado, Alvarez, and Stowell. 2019. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Alvarado, and Stowell. 2018. arXiv:1705.07104 [Cs, Stat].
Bertolacci. 2019. “Hierarchical Bayesian Mixture Models for Spatiotemporal Data with Nonstandard Features.”
Bertolacci, Cripps, Rosen, et al. 2019. Annals of Applied Statistics.
Bertolacci, Rosen, Cripps, et al. 2020. arXiv:1908.06622 [Stat].
Bruinsma, and Turner. 2018. arXiv:1802.08167 [Stat].
Cemgil. 2009. Computational Intelligence and Neuroscience.
Cheng, Sa-Ngasoongsong, Beyca, et al. 2015. IIE Transactions.
Choudhuri, Ghosal, and Roy. 2004a. Biometrika.
———. 2004b. Journal of the American Statistical Association.
Cunningham, Shenoy, and Sahani. 2008. In Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Curtain. 1975. SIAM Journal on Control.
Duvenaud, Nickisch, and Rasmussen. 2011. In Advances in Neural Information Processing Systems.
Dym, and McKean. 2008. Gaussian Processes, Function Theory, and the Inverse Spectral Problem. Dover Books on Mathematics.
Edwards, Meyer, and Christensen. 2015.
———. 2019. Statistics and Computing.
Févotte, Bertin, and Durrieu. 2008. Neural Computation.
Girolami, and Rogers. 2005. In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05.
Godsill, and Cemgil. 2005. In 2005 13th European Signal Processing Conference.
Hartikainen, and Särkkä. 2010. In 2010 IEEE International Workshop on Machine Learning for Signal Processing.
Hensman, Durrande, and Solin. 2018. Journal of Machine Learning Research.
Jesus, and Chandler. 2017. Journal of Time Series Analysis.
Kailath. 1971. “The Structure of Radon-Nikodym Derivatives with Respect to Wiener and Related Measures.” The Annals of Mathematical Statistics.
Kalman, R. 1959. IRE Transactions on Automatic Control.
Kalman, R. E. 1960. Journal of Basic Engineering.
Karvonen, and Särkkä. 2016. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
Kirch, Edwards, Meier, et al. 2019. Bayesian Analysis.
Lindgren, Rue, and Lindström. 2011. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Liutkus, Badeau, and Richard. 2011. IEEE Transactions on Signal Processing.
Liutkus, Rafii, Pardo, et al. 2014. In.
Meier, Kirch, and Meyer. 2020. Journal of Multivariate Analysis.
Meyer, Edwards, Maturana-Russel, et al. 2020. WIREs Computational Statistics.
Nickisch, Solin, and Grigorevskiy. 2018. In International Conference on Machine Learning.
Rasmussen, and Nickisch. 2010. Journal of Machine Learning Research.
Reece, and Roberts. 2010. In 2010 13th International Conference on Information Fusion.
Remes, Heinonen, and Kaski. 2017. In Advances in Neural Information Processing Systems 30.
———. 2018. arXiv:1811.10978 [Cs, Stat].
Roberts, Osborne, Ebden, et al. 2013. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Rosen, Stoffer, and Wood. 2009. Journal of the American Statistical Association.
Rosen, Wood, and Stoffer. 2012. Journal of the American Statistical Association.
Saatçi, Turner, and Rasmussen. 2010. In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10.
Särkkä, Simo. 2007. IEEE Transactions on Automatic Control.
Särkkä, S., and Hartikainen. 2013. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).
Särkkä, Simo, and Nummenmaa. 2009. IEEE Transactions on Automatic Control.
Särkkä, Simo, and Solin. 2019. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks 10.
Särkkä, Simo, Solin, and Hartikainen. 2013. IEEE Signal Processing Magazine.
Solin, and Särkkä. 2013. Physical Review E.
———. 2014. In Artificial Intelligence and Statistics.
Sykulski, Olhede, Guillaumin, et al. 2019. Biometrika.
Tobar. 2019. Advances in Neural Information Processing Systems.
Tobar, Araya-Hernández, Huijse, et al. 2020. arXiv:2011.04585 [Eess, Stat].
Turner, and Sahani. 2014. IEEE Transactions on Signal Processing.
Valenzuela, and Tobar. 2019. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Virtanen, Cemgil, and Godsill. 2008. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
Wiener, and Masani. 1957. Acta Mathematica.
———. 1958. Acta Mathematica.
Wilkinson, W. 2019.
Wilkinson, William J., Andersen, Reiss, et al. 2019a. arXiv:1901.11436 [Cs, Eess, Stat].
Wilkinson, William J., Andersen, Reiss, et al. 2019b. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Wilson, and Adams. 2013. In International Conference on Machine Learning.
Yaglom. 1987. Correlation Theory of Stationary and Related Random Functions. Volume II: Supplementary Notes and References. Springer Series in Statistics.
Zheng, Zhu, and Roy. 2010. Biometrika.

## Footnotes

1. We might more generally consider a sampling problem where we observe the signal through inner products with some sampling kernel, possibly even a stochastic one, but that sounds complicated.↩︎