Probabilistic spectral analysis


Figure 1. Graphical introduction to nonstationary modelling of audio data. The input (bottom) is a sound recording of female speech. We seek to decompose the signal into Gaussian process carrier waveforms (blue block) multiplied by a spectrogram (green block). The spectrogram is learned from the data as a nonnegative matrix of weights times positive modulators (top).(William J. Wilkinson et al. 2019)

I am interested in probabilistic analogues of time frequency analysis and, what is nearly the same thing, autocorrelation analysis. I am especially interested in this for audio signals, which can be very very large, but have certain simplicities - i.e. being scalar functions of a univariate time index, usually regularly sampled.

In signal processing we frequently use Fourier transforms as a notionally nonparametric model for such a system, or a source of features for analysis.

That is classic stuff but it is (for me) always unsatisfying just taking the Fourier transform of something and hoping to have learned stuff about the system. There are a lot of arbitrary tuning parameters and awkward assumptions about, e.g. local stationarity and arbitrary ways of interoducing non-local correlation. The same holds for the deterministic autocorrelogram, on which I have recently published a paper. I got good results, but I had no principled way to select the regularisation and interpretation of the methods. Unsatisfying.

I think we can do better by looking at the probabilistic behaviour of Fourier transforms and treating these as Bayesian nonparametric problems. This could solve a few problems at once. The theory of this is will likely involve Gaussian process regression and state filters and/or tricks of fast Gaussian process calculations on grids and Non negative matrix factorisation, plus artful design of covariance functions.

In practice we will probably need some cunning computational tricks to manage Gaussian processes over long time series, such as filtering Gaussian processes.

Turner and Sahani (2014) is sometimes mentioned as ground-zero of this kind of research, although the connections are certainly much older. Wiener and Khintchine approaches were not far from this (Wiener and Masani 1958, 1957) and it is implicit in Kalman-Bucy filtering (Kalman 1959, 1960; Kailath 1971)] and related but more specialised techniques such as the probabilistic phase vocoder (Godsill and Cemgil 2005). See also the connections to time series state models of (Hartikainen and Särkkä 2010, @LindgrenExplicit2011@ReeceIntroduction2010@LiutkusGaussian2011)

The nicest introductions, are some papers I am reading through right now; (Solin 2016; Alvarado, Alvarez, and Stowell 2019; William J. Wilkinson et al. 2019), which unite all the pieces I was discussing above with actual applications. I will work through these methods here for my own edification.

🏗

The basic setting here is the same as for typical audio signal analysis; we begin with a (random) signal \(f:\mathbb{R}\to\mathbb{R}\), where the argument is a continuous time index. We do not know this signal, but will infer its properties will have some countable number of discrete observations, \(\mathbf{f}:=\{f(t_k);k=1,2,\dots,K\}.\)

We imagine observations from this signal are modelled by a Gaussian process, giving us the same setup as Gaussian process regression. We introduce the additional assumption here that the scalar index \(\mathcal{I}:=\mathbb{R}\). ) represents time.

If the noise model from that setup looks weird to you for time series… well, it looks weird to me too. I.i.d. Additive noise is not a natural noise model for digital audio signals; I suppose what we are doing here is requiring that there be some model for sampling error and that it may as well be the most convenient possible model to work with, which is additive Gaussian. More general noise models are indeed possible, and if we allow other Gaussian processes as additive noise models then we are on the way to constructing a source separation model. That is is indeed what (Liutkus, Badeau, and Richard 2011) do.1

Anyway, with these choices, this becomes absolutely the classic Gaussian process regression with some specialisation. (univariate index, mean-0)

It is also not far from the classic time frequency spectral analysis setup, where we take Fourier transforms over fixed size windows to estimate a kind of deterministic approximation to \(\kappa\) (thanks Wiener-Khintchine theorem); in that context we are effectively assuming that for each window we have an independent estimation problem, and a periodic kernel. I should make that relationship precise. 🏗

There is clearly a lot wrapped up in the kernel, \(\kappa(t, t';\mathbf{\theta}).\) We will come back to that.

Alvarado, Pablo A., Mauricio A. Alvarez, and Dan Stowell. 2019. “Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain.” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 995–99. https://doi.org/10.1109/ICASSP.2019.8683287.

Alvarado, Pablo A., and Dan Stowell. 2018. “Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music,” November. http://arxiv.org/abs/1705.07104.

Cemgil, Ali Taylan. 2009. “Bayesian Inference for Nonnegative Matrix Factorisation Models.” Computational Intelligence and Neuroscience. https://doi.org/10.1155/2009/785152.

Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. “Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press. https://doi.org/10.1145/1390156.1390181.

Duvenaud, David K., Hannes Nickisch, and Carl E. Rasmussen. 2011. “Additive Gaussian Processes.” In Advances in Neural Information Processing Systems, 226–34. http://papers.nips.cc/paper/4221-additive-gaussian-processes.pdf.

Févotte, Cédric, Nancy Bertin, and Jean-Louis Durrieu. 2008. “Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis.” Neural Computation 21 (3): 793–830. https://doi.org/10.1162/neco.2008.04-08-771.

Girolami, Mark, and Simon Rogers. 2005. “Hierarchic Bayesian Models for Kernel Learning.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 241–48. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102382.

Godsill, Simon J, and Ali Taylan Cemgil. 2005. “Probabilistic Phase Vocoder and Its Application to Interpolation of Missing Values in Audio Signals.” In 2005 13th European Signal Processing Conference, 4. https://www.eurasip.org/Proceedings/Eusipco/Eusipco2005/defevent/papers/cr1319.pdf.

Hartikainen, J., and S. Särkkä. 2010. “Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In 2010 IEEE International Workshop on Machine Learning for Signal Processing, 379–84. Kittila, Finland: IEEE. https://doi.org/10.1109/MLSP.2010.5589113.

Kailath, Thomas. 1971. “The Structure of Radon-Nikodym Derivatives with Respect to Wiener and Related Measures.” The Annals of Mathematical Statistics 42 (3): 1054–67.

Kalman, R. 1959. “On the General Theory of Control Systems.” IRE Transactions on Automatic Control 4 (3): 110–10. https://doi.org/10.1109/TAC.1959.1104873.

Kalman, R. E. 1960. “A New Approach to Linear Filtering and Prediction Problems.” Journal of Basic Engineering 82 (1): 35. https://doi.org/10.1115/1.3662552.

Karvonen, Toni, and Simo Särkkä. 2016. “Approximate State-Space Gaussian Processes via Spectral Transformation.” In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. Vietri sul Mare, Salerno, Italy: IEEE. https://doi.org/10.1109/MLSP.2016.7738812.

Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98. https://doi.org/10.1111/j.1467-9868.2011.00777.x.

Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011. “Gaussian Processes for Underdetermined Source Separation.” IEEE Transactions on Signal Processing 59 (7): 3155–67. https://doi.org/10.1109/TSP.2011.2119315.

Liutkus, Antoine, Zafar Rafii, Bryan Pardo, Derry Fitzgerald, and Laurent Daudet. 2014. “Kernel Spectrogram Models for Source Separation.” In, 6–10. IEEE. https://doi.org/10.1109/HSCMA.2014.6843240.

Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018. “State Space Gaussian Processes with Non-Gaussian Likelihood.” In International Conference on Machine Learning, 3789–98. http://proceedings.mlr.press/v80/nickisch18a.html.

Rasmussen, Carl Edward, and Hannes Nickisch. 2010. “Gaussian Processes for Machine Learning (GPML) Toolbox.” Journal of Machine Learning Research 11 (Nov): 3011–5. http://www.jmlr.org/papers/v11/rasmussen10a.html.

Reece, S., and S. Roberts. 2010. “An Introduction to Gaussian Processes for the Kalman Filter Expert.” In 2010 13th International Conference on Information Fusion, 1–9. https://doi.org/10.1109/ICIF.2010.5711863.

Remes, Sami, Markus Heinonen, and Samuel Kaski. 2017. “Non-Stationary Spectral Kernels.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4642–51. Curran Associates, Inc. http://papers.nips.cc/paper/7050-non-stationary-spectral-kernels.pdf.

Särkkä, S., and J. Hartikainen. 2013. “Non-Linear Noise Adaptive Kalman Filtering via Variational Bayes.” In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. https://doi.org/10.1109/MLSP.2013.6661935.

Särkkä, Simo. 2007. “On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems.” IEEE Transactions on Automatic Control 52 (9): 1631–41. https://doi.org/10.1109/TAC.2007.904453.

Särkkä, Simo, and A. Nummenmaa. 2009. “Recursive Noise Adaptive Kalman Filtering by Variational Bayesian Approximations.” IEEE Transactions on Automatic Control 54 (3): 596–600. https://doi.org/10.1109/TAC.2008.2008348.

Särkkä, Simo, A. Solin, and J. Hartikainen. 2013. “Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.” IEEE Signal Processing Magazine 30 (4): 51–61. https://doi.org/10.1109/MSP.2013.2246292.

Särkkä, Simo, and Arno Solin. 2019. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks 10. Cambridge ; New York, NY: Cambridge University Press.

Solin, Arno. 2016. “Stochastic Differential Equation Methods for Spatio-Temporal Gaussian Process Regression.” Aalto University. https://aaltodoc.aalto.fi:443/handle/123456789/19842.

Solin, Arno, and Simo Särkkä. 2013. “Infinite-Dimensional Bayesian Filtering for Detection of Quasiperiodic Phenomena in Spatiotemporal Data.” Physical Review E 88 (5): 052909. https://doi.org/10.1103/PhysRevE.88.052909.

———. 2014. “Explicit Link Between Periodic Covariance Functions and State Space Models.” In Artificial Intelligence and Statistics, 904–12. http://proceedings.mlr.press/v33/solin14.html.

Turner, Richard E., and Maneesh Sahani. 2014. “Time-Frequency Analysis as Probabilistic Inference.” IEEE Transactions on Signal Processing 62 (23): 6171–83. https://doi.org/10.1109/TSP.2014.2362100.

Virtanen, T., A. Taylan Cemgil, and S. Godsill. 2008. “Bayesian Extensions to Non-Negative Matrix Factorisation for Audio Signal Modelling.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 1825–8. https://doi.org/10.1109/ICASSP.2008.4517987.

Wiener, N, and P Masani. 1957. “The Prediction Theory of Multivariate Stochastic Processes.” Acta Mathematica 98 (1): 111–50. https://doi.org/10.1007/BF02404472.

———. 1958. “The Prediction Theory of Multivariate Stochastic Processes, II.” Acta Mathematica 99 (1): 93–137. https://doi.org/10.1007/BF02392423.

Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019. “End-to-End Probabilistic Inference for Nonstationary Audio Analysis,” January. https://arxiv.org/abs/1901.11436v1.

Wilkinson, W. J. 2019. “Gaussian Process Modelling forAudio Signals.” Doctoral, London: Queen Mary University of London. https://theses.eurasip.org/theses/838/gaussian-process-modelling-for-audio-signals/.

Wilkinson, W. J., M. Riis Andersen, J. D. Reiss, D. Stowell, and A. Solin. 2019. “Unifying Probabilistic Models for Time-Frequency Analysis.” In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3352–6. https://doi.org/10.1109/ICASSP.2019.8682306.

Yaglom, A. M. 1987. Correlation Theory of Stationary and Related Random Functions: Supplementary Notes and References. Springer Series in Statistics. New York, NY: Springer Science & Business Media.


  1. We might more generally consider a sampling problem where we observe the signal through inner products with some sampling kernel, possibly even a stochastic one, but that sounds complicated.↩︎