kernel_tricks on Dan MacKinlayhttps://danmackinlay.name/tags/kernel_tricks.htmlRecent content in kernel_tricks on Dan MacKinlayHugo -- gohugo.ioen-usWed, 26 Jan 2022 08:45:51 +1100Gaussian process regressionhttps://danmackinlay.name/notebook/gp_regression.htmlWed, 26 Jan 2022 08:45:51 +1100https://danmackinlay.name/notebook/gp_regression.htmlQuick intro Density estimation Kernels Using state filtering On lattice observations On manifolds By variational inference With inducing variables By variational inference with inducing variables With vector output Deep Approximation with dropout For dimension reduction Pathwise/Matheron updates Readings Implementations Geostat Framework GPy Stheno GPyTorch Plain pyro George GPFlow Misc python Stan AutoGP scikit-learn Misc julia MATLAB References Chi Feng’s GP regression demo.Measure-valued stochastic processeshttps://danmackinlay.name/notebook/measure_priors.htmlMon, 10 Jan 2022 14:17:00 +1100https://danmackinlay.name/notebook/measure_priors.htmlCompletely random measures Random coefficient polynomials For categorical variables Pitman-Yor Indian Buffet process Beta process Other measure priors References Often I need to have a nonparametric representation for a measure over some non-finite index set. We might want to represent a probability, or mass, or a rate. I might want this representation to be something flexible and low-assumption, like a Gaussian process. If I want a nonparametric representation of functions this is not hard; I can simply use a Gaussian process.Convolutional subordinator processeshttps://danmackinlay.name/notebook/subordinator_convolution.htmlWed, 01 Dec 2021 16:51:37 +1100https://danmackinlay.name/notebook/subordinator_convolution.htmlReferences Defining stochastic processes by convolution of noise with smoothing kernels, where the driving noise is a Lévy subordinator.
Why would we want this? One reason is that this gives us a way to create nonparametric distributions over measures.
A related but distinct technique is that we can create interesting Generalized GAmma convolution,
References Barndorff-Nielsen, O. E., and J. Schmiegel. 2004. “Lévy-Based Spatial-Temporal Modelling, with Applications to Turbulence.Stationarity in stochastic processeshttps://danmackinlay.name/notebook/stationarity.htmlMon, 29 Nov 2021 09:33:44 +1100https://danmackinlay.name/notebook/stationarity.htmlChange point methods Nonparametric methods References Non-stationary dynamics turn out to be important in practice.
Notes on detecting non-stationarity in stochastic processes/random fields.
Related: ensuring stability. If we have a system with stable dynamics and keep the distribution of the inputs the same, then it will end up stationary.
Shay Palachy, Detecting stationarity in time series data
Change point methods Nonparametric methods References Adams, Ryan Prescott, and David J.Multi-output Gaussian process regressionhttps://danmackinlay.name/notebook/gp_regression_vector.htmlFri, 26 Nov 2021 11:42:17 +1100https://danmackinlay.name/notebook/gp_regression_vector.htmlTooling References Multi-task learning in GP regression by assuming the model is distributed as a multivariate Gaussian process.
WARNING: Under heavy construction ATM; and makes no sense.
My favourite introduction is by Eric Perim, Wessel Bruinsma, and Will Tebbutt, in a series of blog posts spun off a paper (Bruinsma et al. 2020) which attempt to unify various approaches to defining vector GP processes, and thereby derive an efficient method incorporating good features of all of them.Combining kernelshttps://danmackinlay.name/notebook/kernel_combo.htmlThu, 25 Nov 2021 11:10:29 +1100https://danmackinlay.name/notebook/kernel_combo.htmlOperations preserving positivity Applying kernels over different axes Locally stationary kernels Stationary reducible kernels Other nonstationary kernels References A sum or product (or outer sum, or tensor product) of Mercer kernels is still a kernel. For other operations YMMV.
Operations preserving positivity For example, in the case of Gaussian processes, suppose that, independently,
\[\begin{aligned} f_{1} &\sim \mathcal{GP}\left(\mu_{1}, k_{1}\right)\\ f_{2} &\sim \mathcal{GP}\left(\mu_{2}, k_{2}\right) \end{aligned}\] thenGaussian Process regression via state filteringhttps://danmackinlay.name/notebook/gp_filtering.htmlThu, 25 Nov 2021 09:22:29 +1100https://danmackinlay.name/notebook/gp_filtering.htmlSpatio-temporal usage Latent force models Miscellaneous notes towards implementation References 🏗️🏗️🏗️ Under heavy construction 🏗️🏗️🏗️
Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
I am interested in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering.t-processeshttps://danmackinlay.name/notebook/t_process.htmlWed, 24 Nov 2021 14:01:28 +1100https://danmackinlay.name/notebook/t_process.htmlt-processes regression Markov t-process References Stochastic processes with Student-t marginals. Much as Student-\(t\) distributions generalise Gaussian distributions, \(t\)-processes generalise Gaussian processes.
t-processes regression There are a couple of classic cases in ML where \(t\)-processes arise, e.g. in Bayes NNs (Neal 1996) or GP literature (9.9 Rasmussen and Williams 2006). Recently there has been an uptick in actual applications of these processes in regression (Chen, Wang, and Gorban 2020; Shah, Wilson, and Ghahramani 2014; Tang et al.Karhunen-Loève expansionshttps://danmackinlay.name/notebook/karhunen_loeve.htmlFri, 27 Aug 2021 10:00:45 +1000https://danmackinlay.name/notebook/karhunen_loeve.htmlGaussian References Suppose we have a collection \(\{\varphi_n\}\) of real valued functions on our index space \(T\), and a collection \(\{\xi_n\}\) of uncorrelated random variables. Now we define the random process \[ f(t)=\sum_{n=1}^{\infty} \xi_{n} \varphi_{n}(t). \] We might care about the first two moments of \(f,\) i.e. \[ \mathbb{E}\{f(s) f(t)\}=\sum_{n=1}^{\infty} \sigma_{n}^{2} \varphi_{n}(s) \varphi_{n}(t) \] and variance function \[ \mathbb{E}\left\{f^{2}(t)\right\}=\sum_{n=1}^{\infty} \sigma_{n}^{2} \varphi_{n}^{2}(t) \]
Now suppose that we have a stochastic process where the index \(T\) is a compact domain in \(\mathbb{R}^{N}\).Vector Gaussian processseshttps://danmackinlay.name/notebook/gp_vector.htmlMon, 16 Aug 2021 17:49:54 +1000https://danmackinlay.name/notebook/gp_vector.htmlReferences Adjusting the cross-covariance
As scalar Gaussian processes are to GP regression so are vector Gaussian processes to vector GP regression.
We recall that a classic Gaussian random process/field over some index set \(\mathcal{T}\) is a random function \(f:\mathcal{T}\times\mathcal{T}\to\mathbb{R},\) specified by the (deterministic) functions giving its mean \[ m(t)=\mathbb{E}\{f(t)\} \] and covariance \[ K(s, t)=\mathbb{E}\{(f(s)-m(s))(f(t)-m(t))\}. \]
We can extend this to multivariate Gaussian fields \(\mathcal{T}\times\mathcal{T}\to\mathbb{R}^d\).Convolutional stochastic processeshttps://danmackinlay.name/notebook/stochastic_convolution.htmlMon, 16 Aug 2021 09:28:19 +1000https://danmackinlay.name/notebook/stochastic_convolution.htmlReferences Stochastic processes generated by convolution of white noise with smoothing kernels, which is not unlike kernel smoothing where the “data” is random. Or, to put it another way, these are processes defined as moving averages of some stochastic noise.
For now, I am mostly interested in certain special cases Gaussian convolutions and subordinator convolutions.
C&C Karhunen-Loeve expansion.
References Adler, Robert J. 2010. The Geometry of Random Fields.Gaussian processeshttps://danmackinlay.name/notebook/gp.htmlWed, 23 Jun 2021 15:19:40 +1000https://danmackinlay.name/notebook/gp.htmlDerivatives and integrals Integral of a Gaussian process Derivative of a Gaussian process References “Gaussian Processes” are stochastic processes/fields with jointly Gaussian distributions of over all finite sets observation points. The most familiar of these to finance and physics people is usually the Gauss-Markov process, a.k.a. the Wiener process, but there are many others. These processes are convenient due to certain useful properties of the multivariate Gaussian distribution e.Neural net kernelshttps://danmackinlay.name/notebook/kernel_nn.htmlMon, 24 May 2021 14:56:57 +1000https://danmackinlay.name/notebook/kernel_nn.htmlErf kernel Arc-cosine kernel Absolutely homogenous References How I imagine the hyperspherical regularity of an NN kernel.
Random infinite-width NN induce covariances which are nearly dot product kernels in the input parameters. Say we wish to compare the outputs given two input examples \(.\) They depend on the several dot products, \(\mathbf{x}^{\top} \mathbf{x}\), \(\mathbf{x}^{\top} \mathbf{y}\) and \(\mathbf{y}^{\top} \mathbf{y}\). Often it is convenient to discuss the angle \(\theta\) between the inputs: \[ \theta=\cos ^{-1}\left(\frac{\mathbf{x} ^{\top} \mathbf{y}}{\|\mathbf{x}\|\|\mathbf{y}\|}\right) \]Deep Gaussian process regressionhttps://danmackinlay.name/notebook/gp_deep.htmlThu, 13 May 2021 08:21:29 +1000https://danmackinlay.name/notebook/gp_deep.htmlPlatonic ideal Approximation with dropout References Gaussian process layer cake.
Platonic ideal TBD.
Approximation with dropout See NN ensembles.
References Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In PMLR. Damianou, Andreas, and Neil Lawrence. 2013. “Deep Gaussian Processes.” In Artificial Intelligence and Statistics, 207–15. Domingos, Pedro. 2020. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.Dynamical systems via Koopman operatorshttps://danmackinlay.name/notebook/koopmania.htmlFri, 09 Apr 2021 11:46:21 +0800https://danmackinlay.name/notebook/koopmania.htmlReferences NB: Koopman here is B.O. Koopman (Koopman 1931) not S.J. Koopman, who also works in dynamical systems.
I do not know how this works, but maybe this fragment of abstract will do for now (Budišić, Mohr, and Mezić 2012):
A majority of methods from dynamical system analysis, especially those in applied settings, rely on Poincaré’s geometric picture that focuses on “dynamics of states.Kernel zoohttps://danmackinlay.name/notebook/kernel_zoo.htmlTue, 30 Mar 2021 14:20:40 +1100https://danmackinlay.name/notebook/kernel_zoo.htmlStationary Dot-product NN kernels Causal kernels Wiener process kernel Squared exponential Rational Quadratic Matérn Periodic Locally periodic “Integral” kernel Composed kernels Stationary spectral kernels Nonstationary spectral kernels Compactly supported Markov kernels Genton kernels Kernels with desired symmetry Stationary reducible kernels Other nonstationary kernels References What follows are some useful kernels to have in my toolkit, mostly over \(\mathbb{R}^n\) or at least some space with a metric.Learning on manifoldshttps://danmackinlay.name/notebook/learning_on_manifolds.htmlWed, 03 Mar 2021 12:29:39 +1100https://danmackinlay.name/notebook/learning_on_manifolds.htmlLearning on a given manifold Information Geometry Hamiltonian Monte Carlo Langevin Monte Carlo Natural gradient Homogeneous probability References Abraham Bosse, Moyen vniuersel de pratiquer la perspectiue sur les tableaux, ou surfaces irregulieres : ensemble quelques particularitez concernant cet art, & celuy de la graueure en taille-douce (1653)
A placeholder for learning on curved spaces. Not discussed: learning OF the curvature of spaces.
AFAICT this usually boils down to defining an appropriate stochastic process on a manifold.Convolutional Gaussian processeshttps://danmackinlay.name/notebook/gp_convolution.htmlMon, 01 Mar 2021 17:08:51 +1100https://danmackinlay.name/notebook/gp_convolution.htmlConvolutions with respect to a non-stationary driving noise Varying convolutions with respect to a stationary white noise References Gaussian processes by convolution of noise with smoothing kernels, which is a kind of dual to defining them through covariances.
This is especially interesting because it can be made computationally convenient (we can enforce locality) and non-stationarity.
Convolutions with respect to a non-stationary driving noise H. K.Stochastic processes on manifoldshttps://danmackinlay.name/notebook/stochastic_processes_on_manifolds.htmlMon, 01 Mar 2021 16:13:24 +1100https://danmackinlay.name/notebook/stochastic_processes_on_manifolds.htmlReferences TBD.
References Adler, Robert J. 2010. The Geometry of Random Fields. SIAM ed. Philadelphia: Society for Industrial and Applied Mathematics. Adler, Robert J., and Jonathan E. Taylor. 2007. Random Fields and Geometry. Springer Monographs in Mathematics 115. New York: Springer. Adler, Robert J, Jonathan E Taylor, and Keith J Worsley. 2016. Applications of Random Fields and Geometry Draft. Bhattacharya, Abhishek, and Rabi Bhattacharya.Learning covariance functionshttps://danmackinlay.name/notebook/kernel_learning.htmlMon, 01 Mar 2021 13:25:10 +1100https://danmackinlay.name/notebook/kernel_learning.htmlLearning kernel hyperparameters Learning kernel composition Via neural nets Hyperkernels References This is usually in the context of Gaussian processes where everything can work out nicely if you are lucky, but other kernel machines are OK too. The goal for most of these is to maximise the marginal posterior likelihood, a.k.a. model evidence, as is conventional in Bayesian ML.
Learning kernel hyperparameters 🏗
Learning kernel composition Automating kernel design by some composition of simpler atomic kernels.Miscellaneous nonstationary kernelshttps://danmackinlay.name/notebook/kernel_nonstationary.htmlThu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_nonstationary.htmlReferences Kernels that are nonstationary constructed by other means than warping stationary ones.
Maybe start with Jun and Stein (2008);Fuglstad et al. (2015); Fuglstad et al. (2013)?
References Bolin, David, and Kristin Kirchner. 2020. “The Rational SPDE Approach for Gaussian Random Fields With General Smoothness.” Journal of Computational and Graphical Statistics 29 (2): 274–85. Bolin, David, and Finn Lindgren. 2011. “Spatial Models Generated by Nested Stochastic Partial Differential Equations, with an Application to Global Ozone Mapping.Warping of stationary stochastic processeshttps://danmackinlay.name/notebook/stationary_warping.htmlThu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/stationary_warping.htmlStationary reducible kernels Classic deformations MacKay warping As a function of input Learning transforms References Transforming stationary processes into non-stationary ones by transforming their inputs (Sampson and Guttorp 1992; Genton 2001; Genton and Perrin 2004; Perrin and Senoussi 1999, 2000).
This is of interest in the context of composing kernels to have known desirable properties by known transforms, and also learning (somewhat) arbitrary transforms to attain stationarity.Covariance functionshttps://danmackinlay.name/notebook/covariance_kernels.htmlTue, 05 Jan 2021 15:07:38 +1100https://danmackinlay.name/notebook/covariance_kernels.htmlCovariance kernels of some example processes A simple Markov chain The Hawkes process Gaussian processes General real covariance kernels Bonus: complex covariance kernels Kernel zoo Learning kernels Non-positive kernels References A realisation of a nonstationary rough covariance process (partially observed)
On the interpretation of kernels as the covariance functions of stochastic processes, whcih is one way to define stochastic processes.
Suppose we have a real-valued stochastic processEfficient factoring of GP likelihoodshttps://danmackinlay.name/notebook/gp_factoring.htmlMon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlBasic sparsity via inducing variables SVI for Gaussian processes Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.Non-Gaussian Bayesian functional regressionhttps://danmackinlay.name/notebook/stochastic_process_regression.htmlWed, 16 Sep 2020 14:07:32 +1000https://danmackinlay.name/notebook/stochastic_process_regression.htmlReferences Regression using non-Gaussian random fields. Generalised Gaussian process regression.
Is there ever an actual need for this? Or can we just use mostly-Gaussian process with some non-Gaussian distribution marginal and pretend, via GP quantile regression, or some variational GP approximation or non-Gaussian likelihood over Guaussian latents. Presumably if we suspect higher moments than the second are important, or that there is some actual stochastic process that we know matches our phenomenon, we might bother with this, but oh my it can get complicated.Gaussian process quantile regressionhttps://danmackinlay.name/notebook/gp_quantile_regression.htmlWed, 16 Sep 2020 13:44:32 +1000https://danmackinlay.name/notebook/gp_quantile_regression.htmlReferences How to do quantile regression with GPs.
References Boukouvalas, Alexis, Remi Barillec, and Dan Cornford. 2012. “Gaussian Process Quantile Regression Using Expectation Propagation.” In ICML 2012. Reich, Brian J. 2012. “Spatiotemporal Quantile Regression for Detecting Distributional Changes in Environmental Processes.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 61 (4): 535–53. Reich, Brian J., Montserrat Fuentes, and David B. Dunson.Statistics of spatio-temporal processeshttps://danmackinlay.name/notebook/spatio_temporal.htmlFri, 11 Sep 2020 13:30:12 +1000https://danmackinlay.name/notebook/spatio_temporal.htmlIntros Laplace approximation Tools References The dynamics of spatial processes evolving in time.
Clearly there are many different problems one might wonder about here. I am thinking in particular of the kind of problem whose discretisation might look like this, as a graphical model.
This is highly stylized - I’ve imagined there is one spatial dimension, but usually there would be two or three.(Reproducing) kernel trickshttps://danmackinlay.name/notebook/kernel_methods.htmlMon, 20 Jan 2020 13:55:43 +1100https://danmackinlay.name/notebook/kernel_methods.htmlIntroductions Kernel approximation RKHS distribution embedding Specific kernels Non-scalar-valued “kernels” References WARNING: This is very old. If I were to write it now, I would write it differently. I might break apart kernel tricks from kernels and I might wonder when we need a countable Mercer-style kernel decomposition and when we can do without.
Kernel in the sense of the “kernel trick”. Not to be confused with smoothing-type convolution kernels, nor the dozens of related-but-slightly-different clashing definitions of kernel; those can have their own respective pages.Defining dynamics via Gaussian processeshttps://danmackinlay.name/notebook/gp_dynamics.htmlWed, 18 Sep 2019 10:21:15 +1000https://danmackinlay.name/notebook/gp_dynamics.htmlReferences Two classic flavours together, Gaussian Processes and dynamical_systems, where the dynamics are modelled by a Gaussian process.
Here we use Gaussian processes to define the dynamics, in particular to learn nonparametric transition, observation or state densities. This is what [Turner, Deisenroth, and Rasmussen (2010);Frigola, Chen, and Rasmussen (2014);Frigola et al. (2013);EleftheriadisIdentification2017] do.
This is distinct from calculating a Gaussian process posterior via a state filter, which is another way you can combine the concepts of dynamics and Gaussian process.Representer theoremshttps://danmackinlay.name/notebook/representer_theorems.htmlMon, 16 Sep 2019 12:27:34 +1000https://danmackinlay.name/notebook/representer_theorems.htmlReferences In spatial statistics, Gaussian processes, kernel machines and covariance functions, regularisation.
🏗
References Bohn, Bastian, Michael Griebel, and Christian Rieger. 2018. “A Representer Theorem for Deep Kernel Learning.” arXiv:1709.10441 [Cs, Math], June. Boyer, Claire, Antonin Chambolle, Yohann de Castro, Vincent Duval, Frédéric de Gournay, and Pierre Weiss. 2018. “Convex Regularization and Representer Theorems.” In arXiv:1812.04355 [Cs, Math]. Boyer, Claire, Antonin Chambolle, Yohann De Castro, Vincent Duval, Frédéric De Gournay, and Pierre Weiss.