probabilistic_algorithms on Dan MacKinlay
https://danmackinlay.name/tags/probabilistic_algorithms.html
Recent content in probabilistic_algorithms on Dan MacKinlayHugo -- gohugo.ioen-usMon, 29 Mar 2021 13:35:28 +1100Infinite width limits of neural networks
https://danmackinlay.name/notebook/nn_infinite_width.html
Mon, 29 Mar 2021 13:35:28 +1100https://danmackinlay.name/notebook/nn_infinite_width.htmlNeural Network Gaussian Process Neural Network Tangent Kernel Implicit regularization Dropout As stochastic processes References Large-width limits of neural nets.
Neural Network Gaussian Process See Neural network Gaussian process on Wikipedia.
The field that sprang from the insight (Neal 1996a) that in the infinite limit deep NNs asymptotically approach Gaussian processes, and there are theories we can draw from that. Far from the infinite limit there are neural nets which exploit this.Log concave distributions
https://danmackinlay.name/notebook/log_concave_dist.html
Thu, 11 Mar 2021 09:32:05 +1100https://danmackinlay.name/notebook/log_concave_dist.htmlLangevin MCMC References Langevin MCMC “a Markov Chain reminiscent of noisy gradient descent”. Holden Lee, Andrej Risteski introduce this the connection between log-concavity and convex optimisation.
\[ x_{t+\eta} = x_t - \eta \nabla f(x_t) + \sqrt{2\eta}\xi_t,\quad \xi_t\sim N(0,I). \]
Rob Salomone explains this well; see Hodgkinson, Salomone, and Roosta (2019).
Andrej Risteski’s Beyond log-concave sampling series is a also a good introduction to log-concave sampling.Markov Chain Monte Carlo methods
https://danmackinlay.name/notebook/mcmc.html
Thu, 11 Mar 2021 09:32:05 +1100https://danmackinlay.name/notebook/mcmc.htmlHamiltonian Monte Carlo Connection to variational inference Adaptive MCMC Langevin Monte carlo Tempering Mixing rates Debiasing via coupling References This chain pump is not a good metaphor for how a Markov chain Monte Carlo sampler works, but it does correctly evoke the effort involved.
Despite studying within this area, I have nothing to say about MCMC broadly, but I do have some things I wish to keep notes on.Reparameterization tricks in inference
https://danmackinlay.name/notebook/reparameterization_trick.html
Mon, 08 Mar 2021 18:07:53 +1100https://danmackinlay.name/notebook/reparameterization_trick.htmlFor variational autoencoders “Normalized” flows For density estimation Representational power of Tutorials References Approximating the desired distribution by perturbation of the available distribution
A trick in e.g. variational inference, especially autoencoders, for density estimation in probabilistic deep learning, best summarised as “fancy change of variables to that I can differentiate through the parameters of a distribution”. Connections to optimal transport and likelihood free inference in that this trick can enable some clever approximate-likelihood approaches.Feynman-Kac formulae
https://danmackinlay.name/notebook/feynman_kac.html
Wed, 27 Jan 2021 11:55:19 +1100https://danmackinlay.name/notebook/feynman_kac.htmlReferences There is a mathematically rich theory about particle filters work. The notoriously abstruse Del Moral (2004); Doucet, Freitas, and Gordon (2001) are universally commended for unifying and making consistent the diffusion processes and Feynman-Kac formulae and “propagation of chaos”. I will get around to them eventually, maybe?
References Cérou, F., P. Del Moral, T. Furon, and A. Guyader. 2011. “Sequential Monte Carlo for Rare Event Estimation.Random embeddings and hashing
https://danmackinlay.name/notebook/random_embedding.html
Tue, 01 Dec 2020 14:01:36 +1100https://danmackinlay.name/notebook/random_embedding.htmlReferences Separation of inputs by random projection
See also matrix factorisations, for some extra ideas on why random projections have a role in motivating compressed sensing, arndomised regressions etc.
Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to (Cover 1965).
Cover’s Theorem (Cover 1965):
It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems.Randomised regression
https://danmackinlay.name/notebook/randomised_regression.html
Tue, 01 Dec 2020 14:00:10 +1100https://danmackinlay.name/notebook/randomised_regression.htmlReferences Tackling your regression, by using random projections of the predictors.
Usually this means using those projections to reduce the dimensionality of a high dimensional regression. In this case it is not far from compressed sensing, except in how we handle noise. In this linear model case, this is of course random linear algebra, and may be a randomised matrix factorisation.
I am especially interested in seeing how this might be useful for dependent data, especially time series.Recommender systems
https://danmackinlay.name/notebook/recommender_systems.html
Mon, 30 Nov 2020 14:55:18 +1100https://danmackinlay.name/notebook/recommender_systems.htmlReferences Not my area, but I need a landing page to refer to for some non-specialist contacts of mine.
I am most familiar with the matrix factorization approaches (e.g. factorization machines, NNMF) but there are many, e.g. variational autoencoder approaches are en vogue.
An overview by Javier lists many approaches.
Most Popular recommendations (the baseline) Item-User similarity based recommendations kNN Collaborative Filtering recommendations GBM based recommendations Non-Negative Matrix Factorization recommendations Factorization Machines (Steffen Rendle 2010) Field Aware Factorization Machines (Yuchin Juan, et al, 2016) Deep Learning based recommendations (Wide and Deep, Heng-Tze Cheng, et al, 2016) Neural Collaborative Filtering (Xiangnan He et al.Markov Chain Monte Carlo methods
https://danmackinlay.name/notebook/importance_sampling.html
Wed, 28 Oct 2020 13:16:47 +1100https://danmackinlay.name/notebook/importance_sampling.htmlReferences TBD. Sampling from approximate distributions by reweighting.
Art Owen’s Importance sampling chapter
References Ben Rached, N., A. Kammoun, M.-S. Alouini, and R. Tempone. 2016. “Unified Importance Sampling Schemes for Efficient Simulation of Outage Capacity over Generalized Fading Channels.” IEEE Journal of Selected Topics in Signal Processing 10 (2, 2): 376–88. https://doi.org/10.1109/JSTSP.2015.2500201. Ben Rached, Nadhir, Zdravko Botev, Abla Kammoun, Mohamed-Slim Alouini, and Raul Tempone.Bootstrap
https://danmackinlay.name/notebook/bootstrap.html
Fri, 16 Oct 2020 08:08:41 +1100https://danmackinlay.name/notebook/bootstrap.htmlBootstrap bias correction Bootstrap for dependent data As a Bayesian method Pedagogic References Resampling your own data to estimate how good your point-estimator is, and to reduce its bias. In general an intuitive technique. However, gets tricky for e.g. dependent data. For a handy crib sheet for bootstrap failure modes, see Thomas Lumley, When the bootstrap doesn’t work.
In the classical mode, this is a frequentist technique without an immediate Bayesian interpretation.Differentiating through the Gamma
https://danmackinlay.name/notebook/gamma_diff.html
Thu, 15 Oct 2020 10:50:59 +1100https://danmackinlay.name/notebook/gamma_diff.htmlReferences Suppose I want to find a distributional gradient for a gamma process. Generalically I woudl find this via monte carlo gradient estimation.
Here is a problem-specific method:
I allow the latent random state to have more dimensions than a univariate. Let’s get specific. An example arises if we raid the random-variate-generation literature for transform methods to generate RNGs and differentiate A Gamma variate can be generated by a transformed normal and a uniform random variable,or two uniforms, depending on the parameter range.Monte Carlo gradient estimation
https://danmackinlay.name/notebook/mc_grad.html
Wed, 30 Sep 2020 10:59:22 +1000https://danmackinlay.name/notebook/mc_grad.htmlReferences Taking gradients through integrals.
See Mohamed et al. (2020) for a roundup.
https://github.com/deepmind/mc_gradients
A common activity for me at the moment is differentiating the integral - for example, through the inverse-CDF lookup.
You see, what I would really like is the derivative of the mass-preserving continuous map \(\phi_{\theta, \tau}\) such that
\[\mathsf{z}\sim F(\cdot;\theta) \Rightarrow \phi_{\theta, \tau}(\mathsf{z})\sim F(\cdot;\tau). \] Now suppose I wish to optimise or otherwise perturb \(\theta\).Monte Carlo optimisation
https://danmackinlay.name/notebook/mc_opt.html
Wed, 30 Sep 2020 10:59:22 +1000https://danmackinlay.name/notebook/mc_opt.htmlReferences Optimisation via Monte Carlo Simulation. Annealing and all that. TBD.
References Abernethy, Jacob, and Elad Hazan. 2016. “Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier.” In International Conference on Machine Learning, 2520–28. PMLR. http://proceedings.mlr.press/v48/abernethy16.html. Botev, Zdravko I., and Dirk P. Kroese. 2008. “An Efficient Algorithm for Rare-Event Probability Estimation, Combinatorial Optimization, and Counting.” Methodology and Computing in Applied Probability 10 (4, 4): 471–505.Splitting simulation
https://danmackinlay.name/notebook/splitting_simulation.html
Mon, 28 Sep 2020 10:38:21 +1000https://danmackinlay.name/notebook/splitting_simulation.htmlReferences Splitting is a method for zooming in to the important region of an intractable probability distribution
I have just spent so much time writing about this that I had better pause for a while and leave this as a placeholder.
References Aalen, Odd O., Ørnulf Borgan, and S. Gjessing. 2008. Survival and Event History Analysis: A Process Point of View. Statistics for Biology and Health.Data summarization
https://danmackinlay.name/notebook/data_summarization.html
Fri, 18 Sep 2020 06:21:41 +1000https://danmackinlay.name/notebook/data_summarization.htmlCoresets representative subsets Directly approximate log likelihood References Summary statistics which don’t require you to keep all the data but which allow you to do inference nearly as well. e.g sufficient statistics in exponential families allow you to do do certain kind of inference perfectly without anything except summaries. Methods such as variational Bayes summarize data by maintaining a posterior density (usually a mixture models) as a summary of all the data, at some cost in accuracy.Variational autoencoders
https://danmackinlay.name/notebook/variational_autoencoders.html
Thu, 10 Sep 2020 13:17:16 +1000https://danmackinlay.name/notebook/variational_autoencoders.htmlReferences A variational autoencoder uses a limited latent distribution to approximate a complex posterior distribution
A trick in e.g. variational inference/ probabilistic neural nets where we presume that the model is generated by a low-dimensional latent space, which is, if you squint at it, kind of the information bottleneck trick but in a probabilistic setting. To my mind it is a sorta-kinda nonparametric approximate Bayes method.Combinatorics of note
https://danmackinlay.name/notebook/combinatorics.html
Sat, 18 Jul 2020 12:25:14 +1000https://danmackinlay.name/notebook/combinatorics.html Algorithmic complexity and quasi monte carlo both consider combinatorial matters too.
Jörg Arndt’s Matters Computational Variational inference
https://danmackinlay.name/notebook/variational_inference.html
Sun, 24 May 2020 12:04:18 +1000https://danmackinlay.name/notebook/variational_inference.htmlIntroduction Philosophical interpretations In graphical models Inference via KL divergence Mixture models Reparameterization trick Autoencoders Loss functions References Approximating the intractable measure (right) with a transformation of a tractable one (left)
Inference where we approximate the density of the posterior variationally. That is, we use cunning tricks to turn solve an inference problem by optimising over some parameter set, usually one that allows us to trade off difficulty for fidelity in some useful way.Adaptive Markov Chain Monte Carlo samplers
https://danmackinlay.name/notebook/mcmc_adaptive.html
Thu, 30 Apr 2020 17:59:26 +1000https://danmackinlay.name/notebook/mcmc_adaptive.htmlReferences In adaptive MCMC, the trajectories of the simulator is perturbed by external forces (bottom right, centre) to change how they approaches the target (top right)
Designing MCMC transition density by online optimisation for optimal mixing. Also called controlled MCMC.
Here we are no longer truly using a Markov chain because the transition parameters depend upon the entire history of the chain (for example because you are dynamically updating the transition parameters to improve mixing etc).Tuning an MCMC sampler
https://danmackinlay.name/notebook/mcmc_tuning.html
Thu, 30 Apr 2020 15:22:07 +1000https://danmackinlay.name/notebook/mcmc_tuning.htmlProposal density Transition density Adaptive SMC Variational inference References The process of adapting to the target optimally
Designing MCMC transition density, possibly via the proposal density in rejection sampling, by optimisation for optimal mixing.
The simplest way to do this is to do a “pilot” run to estimate optimal mixing kernels then use the adapted mixing kernels, discarding the suspect samples from the pilot run as suspect.The cross entropy method
https://danmackinlay.name/notebook/cross_entropy_method.html
Fri, 24 Apr 2020 17:05:31 +1000https://danmackinlay.name/notebook/cross_entropy_method.htmlReferences A trick in Monte Carlo simulation, particularly rejection sampling, to optimise an proposal distribution. This notebook exists because I need to quickly audition this method for solving a problem and the wikipedia explanation is incomprehensible. I will create an explanatory example here and put it in Wikipedia. Maybe.
References Boer, Pieter-Tjerk de, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein. 2005. “A Tutorial on the Cross-Entropy Method.Particle filters
https://danmackinlay.name/notebook/particle_filters.html
Wed, 08 Apr 2020 10:50:05 +1000https://danmackinlay.name/notebook/particle_filters.htmlFeynman-Kac formulae Weird evolution equations Miscellaneous practical introductions Tooling References A field of study concerning certain kinds of stochastic processes. The easiest entry point is IMO to think about randomised generalisation of state filter models. This has nothing to to with filters for particulate matter as seen in respirators.
There is too much confusing and unhelpful terminology here, and I am only at the fringe of this field so I will not attempt to typologize.Bias reduction
https://danmackinlay.name/notebook/bias_reduction.html
Wed, 26 Feb 2020 10:54:41 +1100https://danmackinlay.name/notebook/bias_reduction.htmlReferences Trying to reduce bias in point estimators by, e.g. bootstrap. In, e.g. AIC we try to compensate for bias in the model selection. In bias reduction we try to eliminate it from our estimates.
This looks interesting: Kosmidis and Lunardon (2020)
The current work develops a novel method for the reduction of the asymptotic bias of M-estimators from general, unbiased estimating functions. We call the new estimation method reduced-bias M -estimation, or RBM -estimation in short.Random matrix theory
https://danmackinlay.name/notebook/random_matrix.html
Thu, 10 Oct 2019 13:49:25 +1100https://danmackinlay.name/notebook/random_matrix.htmlTo read References Matrices with distributions for the elements give rise to random matrix theory.
Where you consider “matrix valued random variable” because you are discussing a classic distribution the Wishart distribution or whatever, those are also random matrices, obviously, but not the ones that would usually occur to one when thinking of random matrix theory. The archetypal result in capitalised Random Matrix Theory is Wigner’s semicircle law, which gives the distribution of eigenvalues in growing symmetric square matrices with a certain element-wise real distribution.Nearly sufficient statistics
https://danmackinlay.name/notebook/nearly_sufficient_statistics.html
Mon, 14 Jan 2019 15:10:53 +1100https://danmackinlay.name/notebook/nearly_sufficient_statistics.htmlSufficient statistics in exponential families References 🏗
I’m working through a small realisation, for my own interest, which has been helpful in my understanding of variational Bayes; specifically relating it to non-Bayes variational inference. Also sequential monte carlo.
By starting from the idea of sufficient statistics, we come to the idea of variational inference in a natural way, via some other interesting stopovers.
Consider the Bayes filtering setup.Rare-event-conditional estimation
https://danmackinlay.name/notebook/rare_event_simulation.html
Fri, 10 Nov 2017 15:31:43 +1100https://danmackinlay.name/notebook/rare_event_simulation.htmlImportance sampling Dynamic splitting Large deviations References As seen in tail risk estimation.
At the moment I mostly care about splitting simulation, but the set-up for that problem is here.
I consider the problem of simulating some quantity of interest conditional on a tail-event defined on a \(d\)-dimensional continuous random variable \(X\) and importance function \(S: \mathbb{R}^d\rightarrow \mathbb{R}\). We write the conditional density \(f^*\) in terms of the density of \(X\) asCompressed sensing / compressed sampling
https://danmackinlay.name/notebook/compressed_sensing.html
Wed, 14 Jun 2017 13:29:44 +0800https://danmackinlay.name/notebook/compressed_sensing.htmlBasic Compressed Sensing Redundant compressed sensing Introductory texts …Using random projections …Using deterministic projections Phase transitions Weird things to be classified References Higgledy-piggledy notes on the theme of exploiting sparsity to recover signals from few non-local measurements, given that we know they are nearly sparse, in a sense that will be made clear soon.
See also matrix factorisations, restricted isometry properties, Riesz bases…
Basic Compressed Sensing I’ll follow the intro of (E.Random neural networks
https://danmackinlay.name/notebook/nn_random.html
Sun, 19 Feb 2017 13:19:38 +1100https://danmackinlay.name/notebook/nn_random.htmlRecurrent: Echo State Machines / Random reservoir networks Random convolutions References If you do not bother to train your neural net, what happens? In the infinite-width limit you get a Gaussian process. There are a number of net architectures which do not make use of that argument and which are still random though.
Recurrent: Echo State Machines / Random reservoir networks This sounds deliciously lazy; At a glance it sounds like the process is to construct a random recurrent network, i.Quasi Monte Carlo
https://danmackinlay.name/notebook/qmc.html
Tue, 14 Feb 2017 11:48:36 +1100https://danmackinlay.name/notebook/qmc.htmlReferences Simplistically put, using a random, Monte Carlo style algorithm, but deterministically, by sampling at well-chosen points.
Key words: discrepancy.
Some of the series of points used are nice for parallelised algorithms, by the way, in the same way that randomised algorithms are.
Low discrepancy sequences such as Sobol nets, others? Do Gray codes, fit in here? If you aren’t doing this incrementally you can pre-generate a point set rather than a sequence.Randomised linear algebra
https://danmackinlay.name/notebook/random_linear_algebra.html
Tue, 16 Aug 2016 16:57:35 +1000https://danmackinlay.name/notebook/random_linear_algebra.htmlImplementations Random regression References The twin to random matrices, and elder sibling of vector random projections. Notes on doing linear algebra operations using randomised matrix projections. Useful for, e.g. randomised regression.
Quite an old family of methods, but extra hot recently.
Obligatory Igor Carron mention: Random matrices are too damn large.
IBM had a research group in this although they seem to have gone silent since a good year in 2014.Expectation maximisation
https://danmackinlay.name/notebook/expectation_maximisation.html
Sun, 17 Apr 2016 19:06:55 +0800https://danmackinlay.name/notebook/expectation_maximisation.htmlReferences A particular optimisation method for statistics for that gets you a maximum likelihood estimate despite various annoyances such as missing data.
Vague description of the algorithm:
We have an experimental process that generates a random vector \(B\cup Y\) according to parameter \(\theta\). We wish to estimate the parameter of interest \(\theta\) by maximum likelihood. However, we only observe i.i.d. samples \(b_i\) drawn from \(B\). The likelihood function of the incomplete data \(L(\theta, b)\) is tedious or intractable to maximise.Monte Carlo methods
https://danmackinlay.name/notebook/monte_carlo.html
Wed, 30 Dec 2015 21:46:45 +1100https://danmackinlay.name/notebook/monte_carlo.htmlMarkov chain samplers Multi-level Monte Carlo Sequential Monte Carlo Quasi Monte Carlo Cross Entropy Method Monte Carlo gradient estimation References Finding functionals (traditionally integrals) approximately by guessing cleverly. Often, but not always, used for approximate statistical inference, especially certain Bayesian techniques. Probably the most prominent use case is Bayesian statistics, where various Monte Carlo methods turn out to be effective for various inference problems. This is far from the only use however.Biomimetic algorithms
https://danmackinlay.name/notebook/biomimetic_algorithms.html
Tue, 22 Dec 2015 07:34:43 +0800https://danmackinlay.name/notebook/biomimetic_algorithms.htmlReferences Nature inspired algorithms for computers for problems without obvious “normal” solutions. (If you want to use computer-inspired algorithms for nature, that is the dual to this, bio-computing.)
The problem to be solved is usually a search/optimisation one. Normally evolutionary algorithms are in here too. (ant colonies, particle swarms, that one based on choirs… harmony search?), Typically these are attractive because they are simple to explain, although often less simple to analyse.Expectation propagation
https://danmackinlay.name/notebook/expectation_propagation.html
Mon, 26 Oct 2015 19:39:49 +0700https://danmackinlay.name/notebook/expectation_propagation.htmlReferences A classic message-passing inference method.
Expectation Propagation is exact in the large data limit
Guillaume Dehaene, Neurosciences, UNIGE
Expectation Propagation (EP, Minka 2001) is a popular algorithm for approximating posterior distributions. While it is known empirically to give good approximations at a low computational cost (Nickisch and Rasmussen, 2008), it’s also very poorly understood. In this talk, I will present some new results on EP in the large-data limit.Random number generation
https://danmackinlay.name/notebook/prng.html
Tue, 13 Oct 2015 12:36:18 +0800https://danmackinlay.name/notebook/prng.htmlUniform PRNGs Non-uniform RNG algorithms Practical pseudo-RNG implementation. See also pseudorandomness for theories Monte Carlo for some applications, and for some background theory algorithmic statistics.
Uniform PRNGs Generating uniformly distributed numbers on some interval, such as [0,1].
I constantly have to do this in languages that do not conveniently support…
local
seedable
convenient
… PRNGs.
Javascript doesn’t support seeding. Supercollider does but insists on a per-thread RNG.