generative on Dan MacKinlayhttps://danmackinlay.name/tags/generative.htmlRecent content in generative on Dan MacKinlayHugo -- gohugo.ioen-usSat, 30 Jul 2022 23:22:37 +1000Gaussian process inference by gradient descenthttps://danmackinlay.name/notebook/gp_gd.htmlSat, 30 Jul 2022 23:22:37 +1000https://danmackinlay.name/notebook/gp_gd.htmlReferences \[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
Notoriously, GP regression scales badly with dataset size, requiring us to invert a matrix full of observation covariances. But inverting a matrix is just solving a least square optimisation, when you think about it. So can we solve it by gradient descent and have it somehow come out cheaper? Maybe.
References Chen, Hao, Lili Zheng, Raed Al Kontar, and Garvesh Raskutti.Gaussian process regressionhttps://danmackinlay.name/notebook/gp_regression.htmlFri, 29 Jul 2022 08:23:25 +1000https://danmackinlay.name/notebook/gp_regression.htmlQuick intro Incorporating a mean function Density estimation Kernels Using state filtering On lattice observations On manifolds By variational inference With inducing variables By variational inference with inducing variables With vector output Deep Approximation with dropout Inhomogeneous with covariates For dimension reduction Pathwise/Matheron updates Implementations References Chi Feng’s GP regression demo.
Gaussian random processes/fields are stochastic processes/fields with jointly Gaussian distributions of observations. While “Gaussian process regression” is not wrong per se, there is a common convention in stochastic process theory (and also in pedagogy) to use process to talk about some notionally time-indexed process and field to talk about ones that have a some space-like index without a presumption of an arrow of time.Gaussian process regression softwarehttps://danmackinlay.name/notebook/gp_implementation.htmlFri, 29 Jul 2022 07:50:54 +1000https://danmackinlay.name/notebook/gp_implementation.htmlGPy Stheno GPyTorch Plain pyro JuliaGaussianProcesses Ecosystem GPJax TinyGP BayesNewton George Geostat Framework Stan scikit-learn GPFlow Misc python AutoGP MATLAB References Implementations of Gaussian process regression.
GPy was a common default choice in python and GPFlow, for example, has attempted to follow its API. For another value of default, scikit-learn has a GP implementation. Moreover, many generic Bayesian inference toolkits support GP models generically. All things being equal I want better-than-generic support for GP models.Bayes linear regression and basis-functions in Gaussian process regressionhttps://danmackinlay.name/notebook/gp_basis.htmlWed, 27 Jul 2022 11:12:31 +1000https://danmackinlay.name/notebook/gp_basis.htmlFourier features Random Fourier features K-L basis Compactly-supported basis functions “Decoupled” bases References No, these are officers. You want low rank Gaussian processes.
Another way of cunningly chopping up the work of fitting a Gaussian process is to represent the process as a random function comprising basis functions \(\phi=\left(\phi_{1}, \ldots, \phi_{\ell}\right)\) with the Gaussian random weight vector \(w\) so that \[ f^{(w)}(\cdot)=\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot) \quad \boldsymbol{w} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Sigma}_{\boldsymbol{w}}\right). \] \(f^{(w)}\) is a random function satisfying \(\boldsymbol{f}^{(\boldsymbol{w})} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Phi}_{n} \boldsymbol{\Sigma}_{\boldsymbol{w}} \boldsymbol{\Phi}^{\top}\right)\), where \(\boldsymbol{\Phi}_{n}=\boldsymbol{\phi}(\mathbf{X})\) is a \(|\mathbf{X}| \times \ell\) matrix of features.Posterior Gaussian process samples by updating prior sampleshttps://danmackinlay.name/notebook/gp_pathwise.htmlWed, 27 Jul 2022 10:42:20 +1000https://danmackinlay.name/notebook/gp_pathwise.htmlMatheron updates for Gaussian RVs Exact updates for Gaussian processes Sparse GP setting References \[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
Can we find a transformation that will turn a Gaussian process prior sample into a Gaussian process posterior sample. A special trick where we do GP regression by GP simulation.
The main tool is an old insight made useful for modern problems in J.Bayes for beginnershttps://danmackinlay.name/notebook/bayes_howto.htmlSat, 23 Jul 2022 16:33:34 +1000https://danmackinlay.name/notebook/bayes_howto.htmlPrior choice Teaching Course material worked examples Linear regression Workflow Nonparametrics Tools Applied As a methodology of science Incoming References Even for the most currmudgeonly frequentist it is sometimes refreshing to move your effort from deriving frequentist estimators for intractable models, to using the damn Bayesian ones, which fail in different and interesting ways than you are used to. If it works and you are feeling fancy you might then justify your Bayesian method on frequentist grounds, which washes away the sin.Markov Chain Monte Carlo methodshttps://danmackinlay.name/notebook/mcmc.htmlWed, 08 Jun 2022 12:00:26 +1000https://danmackinlay.name/notebook/mcmc.htmlHamiltonian Monte Carlo Connection to variational inference Adaptive MCMC Langevin Monte carlo Tempering Mixing rates Debiasing via coupling Affine invariant Efficiency of References This chain pump is not a good metaphor for how a Markov chain Monte Carlo sampler works, but it does correctly evoke the effort involved.
Despite studying within this area, I have nothing to say about MCMC broadly, but I do have some things I wish to keep notes on.Generalized Bayesian Computationhttps://danmackinlay.name/notebook/generalized_bayesian_computation.htmlThu, 28 Apr 2022 16:59:23 +1000https://danmackinlay.name/notebook/generalized_bayesian_computation.htmlReferences Placeholder.
Just saw a presentation of Dellaporta et al. (2022).
I am not sure how any of the results are specific to that very impressive paper, but she attributes prior work to Fong, Lyddon, and Holmes (2019); Lyddon, Walker, and Holmes (2018); Matsubara et al. (2021); Pacchiardi and Dutta (2022); Schmon, Cannon, and Knoblauch (2021). Combines bootstrap, Bayes nonparametrics, MMD, simulation based inference in an M-open setting.
Clearly there is some interesting stuff going on here.SLAMhttps://danmackinlay.name/notebook/slam.htmlThu, 28 Apr 2022 08:37:03 +0800https://danmackinlay.name/notebook/slam.htmlNICE-SLAM Tools jaxfg gradslam ceres solver incoming References Estimate of unknown \(\mu\)
Classic robotics problem: reconstruct a scene by moving a camera about the room.
In practice, often boils down to a least squares inference problem, or more generally a Gaussian Belief propagation inference problem.
NICE-SLAM I am interested in a recent cool trick that combines implicit representation with SLAM (Zhu et al. 2022).
NICE-SLAM cvg/nice-slam: [CVPR’22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM There are a lot of cool tricks there — differentiable rendering.Vecchia factoring of GP likelihoodshttps://danmackinlay.name/notebook/gp_vecchia.htmlWed, 27 Apr 2022 10:46:50 +0800https://danmackinlay.name/notebook/gp_vecchia.htmlReferences There are many ways to cleverly slice up GP likelihoods so that inference is cheap. One is the Vecchia approxiamtion: Approximate the precision matrix by one with a sparse cholesky factorisation.
TBD.
References Banerjee, Sudipto, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang. 2008. “Gaussian Predictive Process Models for Large Spatial Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70 (4): 825–48. Datta, Abhirup, Sudipto Banerjee, Andrew O.Particle belief propagationhttps://danmackinlay.name/notebook/particle_message_passing.htmlFri, 08 Apr 2022 11:33:12 +1000https://danmackinlay.name/notebook/particle_message_passing.htmlReferences Empirical CDFs as approximate belief propagation updates.
References Grooms, Ian, and Gregor Robinson. 2021. “A Hybrid Particle-Ensemble Kalman Filter for Problems with Medium Nonlinearity.” PLOS ONE 16 (3): e0248266. Naesseth, Christian Andersson, Fredrik Lindsten, and Thomas B Schön. 2014. “Sequential Monte Carlo for Graphical Models.” In Advances in Neural Information Processing Systems. Vol. 27. Curran Associates, Inc. Naesseth, Christian, Fredrik Lindsten, and Thomas Schon. 2015. “Nested Sequential Monte Carlo Methods.Particle Markov Chain Monte Carlohttps://danmackinlay.name/notebook/particle_mcmc.htmlFri, 08 Apr 2022 11:33:12 +1000https://danmackinlay.name/notebook/particle_mcmc.htmlReferences Particle filters inside general MCMC samplers. Darren Wilkinson wrote a series of blog posts introducing this idea:
MCMC, Monte Carlo likelihood estimation, and the bootstrap particle filter The particle marginal Metropolis-Hastings (PMMH) particle MCMC algorithm Introduction to the particle Gibbs Sampler Turns out to be especially natural for, e.g. change point problems.
References Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov Chain Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (3): 269–342.Bayesian nonparametric statisticshttps://danmackinlay.name/notebook/bayes_nonparametric.htmlThu, 07 Apr 2022 16:26:47 +1000https://danmackinlay.name/notebook/bayes_nonparametric.htmlUseful stochastic processes Posterior updates in infinite dimesnions Bayesian consistency References It is hard to explain what happens to the posterior in this case
Useful stochastic processes A map of popular processes used in Bayesian nonparametrics from Xuan, Lu, and Zhang (2020)
Dirichlet priors, other measure priors, Gaussian Process regression, reparameterisations etc. 🏗
Posterior updates in infinite dimesnions For now, this is just a bookmark to the general measure theoretic notation that unifies, in principle, the various Bayesian nonparametric methods.Belief propagationhttps://danmackinlay.name/notebook/belief_propagation.htmlThu, 31 Mar 2022 17:06:59 +1100https://danmackinlay.name/notebook/belief_propagation.htmlHistory Let’s go Further reading References \[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
Belief propagation works incredibly well if I have configured all my assumptions juuuuuuust so.
The basic inspiration of message-passing inference which turns out to be not always implementable, but gives us some basic inspiration. A concrete, implementable version of use to me is Gaussian Belief Propagation. See also graph computations for a more general sense of Message Passing, and graph NNs for the same idea but with more neural dust sprinkled on it.Gaussian belief propagationhttps://danmackinlay.name/notebook/gaussian_belief_propagation.htmlMon, 28 Mar 2022 20:09:07 +1100https://danmackinlay.name/notebook/gaussian_belief_propagation.htmlParameterization Linearisation Parallelisation Use in PDEs Tools jaxfg gradslam ceres solver incoming References A particularly tractable model assumption for message-passing inference which generalises classic Gaussian Kalman filters and Bayesian linear regression with Gaussian prior, or, in a frequentist setting, [least squares regression](./least_squares.html. Essentially, we regard the various nodes in our system as jointly Gaussian RVs with given prior mean and covariance (i.e. we do not allow the variances themselves to be random, a Gaussian is not a valid prior for a variance.Generative flowhttps://danmackinlay.name/notebook/generative_flow.htmlMon, 07 Mar 2022 12:15:25 +1100https://danmackinlay.name/notebook/generative_flow.htmlReferences Placeholder. There are a lot of keywords in this new technique that sound intriguing, so here is a notebook to revisit if I ever have time.
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation GFlowNet Tutorial Bengio et al. (2021):
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.Learning Gaussian processes which map functions to functionshttps://danmackinlay.name/notebook/gp_regression_functional.htmlFri, 25 Feb 2022 14:00:59 +1100https://danmackinlay.name/notebook/gp_regression_functional.htmlUniversal Kriging Hilbert-space valued GPs References In which I discover how to learn operators via GPs. I suspect a lot of things break; What is a usable gaussian distribution over a mapping between functions?
It might be handy here to revisit the notation for Bayesian nonparametrics, since we don’t get the same kind of setup as when the distributions in question are finitely parameterised. TBC
Universal Kriging Does universal kriging fit in this notebook?Probabilistic programminghttps://danmackinlay.name/notebook/probabilistic_programming.htmlFri, 11 Feb 2022 17:30:00 +1100https://danmackinlay.name/notebook/probabilistic_programming.htmlTutorials and textbooks MCMC considerations Variation inference considerations Toolkits Pyro Stan Forneylab.jl Turing.jl probflow Gen Edward/Edward2 TensorFlow Probability pyprob PyMC3 Mamba.jl Greta Soss.jl Miscellaneous julia options Inferpy Zhusuan Church/Anglican WebPPL BAT Incoming References Probabilistic programming languages (PPLs). A probabilistic programming system is a system for specifying stochastic generative models, and reasoning about them. Or, as Fabiana Clemente puts it
Probabilistic programming is about doing statistics using the tools of computer science.Pyrohttps://danmackinlay.name/notebook/pyro.htmlThu, 25 Nov 2021 11:27:11 +1100https://danmackinlay.name/notebook/pyro.htmlVanilla Pyro Distributed Numpyro Tutorials and textbooks Tips, gotchas Regression Complex numbers Algebraic effects Funsors References A probabilistic programming language. pytorch + Bayesian inference = pyro (Pradhan et al. 2018).
Typical posterior density landscape
Vanilla Pyro For rationale, see the pyro launch announcement:
We believe the critical ideas to solve AI will come from a joint effort among a worldwide community of people pursuing diverse approaches. By open sourcing Pyro, we hope to encourage the scientific world to collaborate on making AI tools more flexible, open, and easy-to-use.Gaussian Processes as stochastic differential equationshttps://danmackinlay.name/notebook/gp_markov.htmlThu, 25 Nov 2021 09:22:29 +1100https://danmackinlay.name/notebook/gp_markov.htmlGP regression via state filtering Spatio-temporal usage Latent force models Miscellaneous notes towards implementation References 🏗️🏗️🏗️ Under heavy construction 🏗️🏗️🏗️
Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
Not covered, another concept which includes the same keywords but is distinct: using Gaussian processes to define state process dynamics or observation distribution.
GP regression via state filtering I am interested in the trick which makes certain Gaussian process regression problems soluble by making them local, i.Neural diffusion modelshttps://danmackinlay.name/notebook/nn_diffusion.htmlThu, 11 Nov 2021 10:45:50 +1100https://danmackinlay.name/notebook/nn_diffusion.htmlReferences Placeholder.
Google AI Blog: High Fidelity Image Generation Using Diffusion Models Denoising Diffusion-based Generative Modeling: Foundations and Applications What are Diffusion Models? Yang Song, Generative Modeling by Estimating Gradients of the Data Distribution Diffusion models are autoencoders – Sander Dieleman Suggestive connection to thermodynamics (Sohl-Dickstein et al. 2015).
References Dhariwal, Prafulla, and Alex Nichol. 2021. “Diffusion Models Beat GANs on Image Synthesis.” arXiv:2105.05233 [Cs, Stat], June. Dutordoir, Vincent, Alan Saul, Zoubin Ghahramani, and Fergus Simpson.Deep generative modelshttps://danmackinlay.name/notebook/nn_generative.htmlThu, 11 Nov 2021 09:52:21 +1100https://danmackinlay.name/notebook/nn_generative.htmlPhilosophical diversion: probability is a weird abstraction References Generating a synthetic observation at great depth
Certain famous models in neural nets are generative — informally, they produce samples some distribution, in training the distribution of those samples is tweaked until its distribution resembles, in some sense, the distribution of our observed data. There are many attempts now to unify fancy generative techniques such as GANs and VAEs and neural diffusiong into a single unified method, or at least a cordial family of methods, so I had better devise a page for that.Energy based modelshttps://danmackinlay.name/notebook/energy_based_models.htmlMon, 07 Jun 2021 18:17:53 +1000https://danmackinlay.name/notebook/energy_based_models.htmlReferences I don’t actually know what fits under this heading, but it sounds like it is simply inference for undirected graphical models? Or is there something distinct going on?
Descending the local energy gradient to a more probable configuration
References Che, Tong, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, and Yoshua Bengio. 2020. “Your GAN Is Secretly an Energy-Based Model and You Should Use Discriminator Driven Latent Sampling.Deep Gaussian process regressionhttps://danmackinlay.name/notebook/gp_deep.htmlThu, 13 May 2021 08:21:29 +1000https://danmackinlay.name/notebook/gp_deep.htmlPlatonic ideal Approximation with dropout References Gaussian process layer cake.
Platonic ideal TBD.
Approximation with dropout See NN ensembles.
References Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017. “Random Feature Expansions for Deep Gaussian Processes.” In PMLR. Damianou, Andreas, and Neil Lawrence. 2013. “Deep Gaussian Processes.” In Artificial Intelligence and Statistics, 207–15. Domingos, Pedro. 2020. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” arXiv:2012.Generative adversarial networkshttps://danmackinlay.name/notebook/nn_gans.htmlMon, 14 Dec 2020 16:32:29 +1100https://danmackinlay.name/notebook/nn_gans.htmlWasserstein GAN Conditional Invertible Spectral normalization GANs as SDEs GANs as VAEs GANs as energy-based models Incoming References The critic providing a gradient update to the generator
Game theory meets learning. Hip, especially in combination with deep learning, because it provides an elegant means of likelihood free inference.
I don’t know anything about it. Something about training two systems together to both generate and classify examples of a phenomenon of interest.Efficient factoring of GP likelihoodshttps://danmackinlay.name/notebook/gp_factoring.htmlMon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlInducing variables Spectral and rank sparsity SVI for Gaussian processes Low rank methods Vecchia factorisation Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.Variational autoencodershttps://danmackinlay.name/notebook/variational_autoencoders.htmlThu, 10 Sep 2020 13:17:16 +1000https://danmackinlay.name/notebook/variational_autoencoders.htmlincoming References A variational autoencoder uses a limited latent distribution to approximate a complex posterior distribution
A method at the intersection of stochastic variational inference and probabilistic neural nets where we presume that the model is generated by a low-dimensional latent space, which is, if you squint at it, kind of the information bottleneck trick but in a probabilistic setting. To my mind it is a sorta-kinda nonparametric approximate Bayes method.Defining dynamics via Gaussian processeshttps://danmackinlay.name/notebook/gp_dynamics.htmlWed, 18 Sep 2019 10:21:15 +1000https://danmackinlay.name/notebook/gp_dynamics.htmlReferences Two classic flavours together, Gaussian Processes and dynamical_systems, where the dynamics are modelled by a Gaussian process.
Here we use Gaussian processes to define the dynamics, in particular to learn nonparametric transition, observation or state densities. This is what [Turner, Deisenroth, and Rasmussen (2010);Frigola, Chen, and Rasmussen (2014);Frigola et al. (2013);EleftheriadisIdentification2017] do.
This is distinct from calculating a Gaussian process posterior via a state filter, which is another way you can combine the concepts of dynamics and Gaussian process.Hamiltonian and Langevin Monte Carlohttps://danmackinlay.name/notebook/hamiltonian_monte_carlo.htmlThu, 12 Jul 2018 21:07:16 +1000https://danmackinlay.name/notebook/hamiltonian_monte_carlo.htmlLangevin Monte Carlo To file References Hamiltonians, energy conservation in sampling. Handy. Summary would be nice.
Michael Betancourt’s heuristic explanation of Hamiltonian Monte Carlo: sets of high mass, no good - we need the “typical set”, a set whose product of differential volume and density is high. Motivates Markov Chain Monte Carlo on this basis, a way of exploring typical set given points already in it, or getting closer to the typical set if starting without.