kernel_tricks on Dan MacKinlay
https://danmackinlay.name/tags/kernel_tricks.html
Recent content in kernel_tricks on Dan MacKinlayHugo -- gohugo.ioen-usTue, 13 Apr 2021 14:40:14 +0800Gaussian process regression
https://danmackinlay.name/notebook/gp_regression.html
Tue, 13 Apr 2021 14:40:14 +0800https://danmackinlay.name/notebook/gp_regression.htmlQuick intro Density estimation Kernels Using state filtering On lattice observations On manifolds By variational inference With inducing variables By variational inference with inducing variables With vector output Approximation with dropout For dimension reduction Readings Implementations Geostat Framework GPy Stheno GPyTorch GPFlow Misc python Stan AutoGP scikit-learn Misc julia MATLAB References Chi Feng’s GP regression demo.
Gaussian random fields are stochastic processes/fields with jointly Gaussian distributions of observations.Dynamical systems via Koopman operators
https://danmackinlay.name/notebook/koopmania.html
Fri, 09 Apr 2021 11:46:21 +0800https://danmackinlay.name/notebook/koopmania.htmlReferences NB: Koopman here is B.O. Koopman (Koopman 1931) not S.J. Koopman, who also works in dynamical systems.
I do not know how this works, but maybe this fragment of abstract will do for now (Budišić, Mohr, and Mezić 2012):
A majority of methods from dynamical system analysis, especially those in applied settings, rely on Poincaré’s geometric picture that focuses on “dynamics of states.Kernel zoo
https://danmackinlay.name/notebook/kernel_zoo.html
Tue, 30 Mar 2021 14:20:40 +1100https://danmackinlay.name/notebook/kernel_zoo.htmlStationary dot-product NN kernels NN Erf kernel Arc-cosine kernel Causal kernels Wiener process kernel Squared exponential Rational Quadratic Matérn Periodic Locally periodic “Integral” kernel Composed kernels Stationary spectral kernels Nonstationary spectral kernels Compactly supported Markov kernels Genton kernels Kernels with desired symmetry Stationary reducible kernels Other nonstationary kernels References What follows are some useful kernels to have in my toolkit, mostly over \(\mathbb{R}^n\) or at least some space with a metric.Stochastic processes which represent measures over the reals
https://danmackinlay.name/notebook/measure_priors.html
Mon, 08 Mar 2021 16:44:16 +1100https://danmackinlay.name/notebook/measure_priors.htmlSubordinators Other measure priors References Often I need to have a nonparametric representation for a measure over some non-finite index set. We might want to represent a probability, or mass, or a rate. I might want this representation to be something flexible and low-assumption, like a Gaussian process. If I want a nonparametric representation of functions this is not hard; I can simply use a Gaussian process.Convolutional subordinator processes
https://danmackinlay.name/notebook/subordinator_convolution.html
Mon, 08 Mar 2021 15:29:19 +1100https://danmackinlay.name/notebook/subordinator_convolution.htmlReferences Stochastic processes by convolution of noise with smoothing kernels, where the driving noise is a Lévy subordinator.
Why would we want this? One reason is that this gives us a way to create nonparametric distributions over measures.
References Barndorff-Nielsen, O. E., and J. Schmiegel. 2004. “Lévy-Based Spatial-Temporal Modelling, with Applications to Turbulence.” Russian Mathematical Surveys 59 (1): 65. https://doi.org/10.1070/RM2004v059n01ABEH000701. Çinlar, E. 1979. “On Increasing Continuous Processes.Combining kernels
https://danmackinlay.name/notebook/kernel_combo.html
Mon, 01 Mar 2021 19:53:09 +1100https://danmackinlay.name/notebook/kernel_combo.htmlLocally stationary kernels Stationary reducible kernels Other nonstationary kernels References A sum or product (or outer sum, or tensor product) of kernels is still a kernel. For other transforms YMMV.
For example, in the case of Gaussian processes, suppose that, independently,
\[\begin{aligned} f_{1} &\sim \mathcal{GP}\left(\mu_{1}, k_{1}\right)\\ f_{2} &\sim \mathcal{GP}\left(\mu_{2}, k_{2}\right) \end{aligned}\] then
\[ f_{1}+f_{2} \sim \mathcal{GP} \left(\mu_{1}+\mu_{2}, k_{1}+k_{2}\right) \] so \(k_{1}+k_{2}\) is also a kernel.Convolutional Gaussian processes
https://danmackinlay.name/notebook/gp_convolution.html
Mon, 01 Mar 2021 17:08:51 +1100https://danmackinlay.name/notebook/gp_convolution.htmlConvolutions with respect to a non-stationary driving noise Varying convolutions with respect to a stationary white noise References Gaussian processes by convolution of noise with smoothing kernels, which is a kind of dual to defining them through covariances.
This is especially interesting because it can be made computationally convenient (we can enforce locality) and non-stationarity.
Convolutions with respect to a non-stationary driving noise H. K.Convolutional stochastic processes
https://danmackinlay.name/notebook/stochastic_convolution.html
Mon, 01 Mar 2021 16:13:24 +1100https://danmackinlay.name/notebook/stochastic_convolution.htmlReferences Stochastic processes generated by convolution of white noise with smoothing kernels, which is not unlike kernel density estimation where the “data” is random.
For now, I am mostly interested in certain special cases Gaussian process convolutionss and subordinator convolutions.
patrick-kidger/Deep-Signature-Transforms: Code for "Deep Signature Transforms" patrick-kidger/signatory: Differentiable computations of the signature and logsignature transforms, on both CPU and GPU. References Bolin, David.Covariance functions
https://danmackinlay.name/notebook/kernel_learning.html
Mon, 01 Mar 2021 13:25:10 +1100https://danmackinlay.name/notebook/kernel_learning.htmlLearning kernel hyperparameters Learning kernel composition Hyperkernels References This is usually in the context of Gaussian processes where everything can work out nicely if you are lucky, but other kernel machines are OK too. The goal for most of these is to maximise the marginal posterior likelihood, a.k.a. model evidence, as is conventional in Bayesian ML.
Learning kernel hyperparameters 🏗
Learning kernel composition Automating kernel design by some composition of simpler atomic kernels.Multi-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_vector.html
Tue, 23 Feb 2021 12:09:36 +1100https://danmackinlay.name/notebook/gp_regression_vector.htmlCo-regionalization Multi-task Multi Output Spectral Mixture Kernel References In which I discover for myself whether “multi-task” and “co-regionalized” approaches are different. Álvarez, Rosasco, and Lawrence (2012)
Overview from Invenia: Gaussian Processes: from one to many outputs
Co-regionalization [the] community has begun to turn its attention to covariance functions for multiple outputs. One of the paradigms that has been considered (Bonilla, Chai, and Williams 2007; Osborne et al.Kernel warping
https://danmackinlay.name/notebook/kernel_warping.html
Thu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_warping.htmlStationary reducible kernels Classic deformations MacKay warping As a function of input Learning transforms References A nonlinear way of transforming stationary kernels into non-stationary ones by transforming their inputs (Sampson and Guttorp 1992; Genton 2001; Genton and Perrin 2004; Perrin and Senoussi 1999, 2000).
This is of interest in the context of composing kernels to have known desirable properties by known transforms, and also learning (somewhat) arbitrary transforms to attain stationarity.Miscellaneous nonstationary kernels
https://danmackinlay.name/notebook/kernel_nonstationary.html
Thu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_nonstationary.htmlReferences Kernels that are nonstationary constructed by other means than warping stationary ones.
Maybe start with Jun and Stein (2008);Fuglstad et al. (2015); Fuglstad et al. (2013)?
References Bolin, David, and Kristin Kirchner. 2020. “The Rational SPDE Approach for Gaussian Random Fields With General Smoothness.” Journal of Computational and Graphical Statistics 29 (2): 274–85. https://doi.org/10.1080/10618600.2019.1665537. Bolin, David, and Finn Lindgren. 2011. “Spatial Models Generated by Nested Stochastic Partial Differential Equations, with an Application to Global Ozone Mapping.Covariance functions
https://danmackinlay.name/notebook/covariance_kernels.html
Tue, 05 Jan 2021 15:07:38 +1100https://danmackinlay.name/notebook/covariance_kernels.htmlCovariance kernels of some example processes A simple Markov chain The Hawkes process Gaussian processes General real covariance kernels Bonus: complex covariance kernels Kernel zoo Learning kernels Non-positive kernels References A realisation of a nonstationary rough covariance process (partially observed)
On the interpretation of kernels as the covariance functions of stochastic processes, whcih is one way to define stochastic processes.
Suppose we have a real-valued stochastic processMulti-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_functional.html
Mon, 07 Dec 2020 20:43:06 +1100https://danmackinlay.name/notebook/gp_regression_functional.htmlReferences In which I discov Learning operators via GPs.
References Brault, Romain, Florence d’Alché-Buc, and Markus Heinonen. 2016. “Random Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning, 110–25. http://arxiv.org/abs/1605.02536. Brault, Romain, Néhémy Lim, and Florence d’Alché-Buc. n.d. “Scaling up Vector Autoregressive Models With Operator-Valued Random Fourier Features.” Accessed August 31, 2016. https://aaltd16.irisa.fr/files/2016/08/AALTD16_paper_11.pdf. Brouard, Céline, Marie Szafranski, and Florence D’Alché-Buc.Hidden Markov Model inference for Gaussian Process regression
https://danmackinlay.name/notebook/gp_filtering.html
Wed, 25 Nov 2020 11:28:43 +1100https://danmackinlay.name/notebook/gp_filtering.htmlSpatio-temporal usage Miscellaneous notes towards implementation References Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.Efficient factoring of GP likelihoods
https://danmackinlay.name/notebook/gp_factoring.html
Mon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlBasic sparsity via inducing variables SVI for Gaussian processes Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.Non-Gaussian Bayesian functional regression
https://danmackinlay.name/notebook/stochastic_process_regression.html
Wed, 16 Sep 2020 14:07:32 +1000https://danmackinlay.name/notebook/stochastic_process_regression.htmlReferences Regression using non-Gaussian random fields. Generalised Gaussian process regression.
Is there ever an actual need for this? Or can we just use mostly-Gaussian process with some non-Gaussian distribution marginal and pretend, via GP quantile regression, or some variational GP approximation or non-Gaussian likelihood over Guaussian latents. Presumably if we suspect higher moments than the second are important, or that there is some actual stochastic process that we know matches our phenomenon, we might bother with this, but oh my it can get complicated.Gaussian process quantile regression
https://danmackinlay.name/notebook/gp_quantile_regression.html
Wed, 16 Sep 2020 13:44:32 +1000https://danmackinlay.name/notebook/gp_quantile_regression.htmlReferences How to do quantile regression with GPs.
References Boukouvalas, Alexis, Remi Barillec, and Dan Cornford. 2012. “Gaussian Process Quantile Regression Using Expectation Propagation.” In ICML 2012. http://arxiv.org/abs/1206.6391. Reich, Brian J. 2012. “Spatiotemporal Quantile Regression for Detecting Distributional Changes in Environmental Processes.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 61 (4): 535–53. https://doi.org/10.1111/j.1467-9876.2011.01025.x. Reich, Brian J., Montserrat Fuentes, and David B.Statistics of spatio-temporal processes
https://danmackinlay.name/notebook/spatio_temporal.html
Fri, 11 Sep 2020 13:30:12 +1000https://danmackinlay.name/notebook/spatio_temporal.htmlTools References The dynamics of spatial processes evolving in time.
Clearly there are many different problems one might wonder about here. I am thinking in particular of the kind of problem whose discretisation might look like this, as a graphical model.
This is highly stylized - I’ve imagined there is one spatial dimension, but usually there would be two or three. The observed notes are where we have sensors that can measure the state of some parameter of interest \(w\) which evolves in time \(t\).(Reproducing) kernel tricks
https://danmackinlay.name/notebook/kernel_methods.html
Mon, 20 Jan 2020 13:55:43 +1100https://danmackinlay.name/notebook/kernel_methods.htmlIntroductions Kernel approximation RKHS distribution embedding Specific kernels Non-scalar-valued “kernels” References Kernel in the sense of the “kernel trick”. Not to be confused with smoothing-type convolution kernels, nor the dozens of related-but-slightly-different clashing definitions of kernel; those can have their own respective pages.
Kernel tricks use “reproducing” kernels as inner products between functions. a.k.a. Mercer kernels ((1909)). The classic machine learning explanation is that this induces a particularly tasty flavour of Hilbert space to work with.Gaussian processes
https://danmackinlay.name/notebook/gaussian_processes.html
Tue, 03 Dec 2019 10:11:26 +1100https://danmackinlay.name/notebook/gaussian_processes.htmlRelationship between addition of covariance kernels and of processes References “Gaussian Processes” are stochastic processes/fields with jointly Gaussian distributions of observations. The most familiar of these to many of us is the Gauss-Markov process, a.k.a. the Wiener process, but there are many others. These processes are convenient due to certain useful properties of the multivariate Gaussian distribution e.g. being uniquely specified by first and second moments, nice behaviour under various linear operations, kernel tricks….Nonparametric state filters via Gaussian Processes
https://danmackinlay.name/notebook/gp_state_filters.html
Wed, 18 Sep 2019 10:21:15 +1000https://danmackinlay.name/notebook/gp_state_filters.htmlReferences Two classic flavours together, Gaussian Processes and state filters. There are other nonparametric state filters, e.g. Variational filters and particle filters.
This is a kind of a dual to using a state filter to calculate a Gaussian process regression as a computational shorthand.
Here we use Gaussian processes to define the filter, in particular to learn nonparametric transition, observation or state densities for a generalized Kalman filter.Representer theorems
https://danmackinlay.name/notebook/representer_theorems.html
Mon, 16 Sep 2019 12:27:34 +1000https://danmackinlay.name/notebook/representer_theorems.htmlReferences In spatial statistics, Gaussian processes, kernel machines and covariance functions, regularisation.
🏗
References Bohn, Bastian, Michael Griebel, and Christian Rieger. 2018. “A Representer Theorem for Deep Kernel Learning.” June 7, 2018. http://arxiv.org/abs/1709.10441. Boyer, Claire, Antonin Chambolle, Yohann de Castro, Vincent Duval, Frédéric de Gournay, and Pierre Weiss. 2018. “Convex Regularization and Representer Theorems.” In. http://arxiv.org/abs/1812.04355. Boyer, Claire, Antonin Chambolle, Yohann De Castro, Vincent Duval, Frédéric De Gournay, and Pierre Weiss.