Langevin dynamcs MCMC
August 16, 2020 — November 15, 2024
Sampling using the particular SDE that is the Langevin equation. It can be rather close to optimising, via SDE representations of SGD.
1 Langevin dynamics
The continuous-time Langevin equation is
\[ d\mathbf{x}_t = -\nabla U(\mathbf{x}_t) \, dt + \sqrt{2} \, d\mathbf{W}_t \]
where
- \(U(\mathbf{x})\) is the potential function related to the target distribution \(p(\mathbf{x})\) via \(p(\mathbf{x}) \propto e^{-U(\mathbf{x})}\).
- \(d\mathbf{W}_t\) represents the increment of a Wiener process (standard Brownian motion).
We care about this because it suggests a sample that is
- very simple,
- only needs to access to the score function, which is less computationally inconvenient when we can get away with it.
We typically discretize the Langevin equation using the Euler-Maruyama method with time step \(\epsilon\)
\[ \mathbf{x}_{k+1} = \mathbf{x}_k - \epsilon \nabla U(\mathbf{x}_k) + \sqrt{2\epsilon} \, \boldsymbol{\eta}_k \]
where \(\boldsymbol{\eta}_k \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_D)\). There are more advanced discretisations; this is the entry-leve.
See also Bayesian inference for a logistic regression model Part 4: Gradients and the Langevin algorithm
2 Metropolis-adjusted (MALA)
3 log-concave
In log-concave distributions it end sup working well to solve the SDE implicitly (Hodgkinson, Salomone, and Roosta 2019) which is powerful when you can do it.
Holden Lee, Andrej Risteski introduce the connection between log-concavity and convex optimisation.
4 Via bridges
Left-field, Max Raginsky, Sampling Using Diffusion Processes, from Langevin to Schrödinger:
the Langevin process gives only approximate samples from \(\mu\). I would like to discuss an alternative approach that uses diffusion processes to obtain exact samples in finite time. This approach is based on ideas that appeared in two papers from the 1930s by Erwin Schrödinger in the context of physics, and is now referred to as the Schrödinger bridge problem.
5 Annealed
TBC
See Jolicoeur-Martineau et al. (2022), Song and Ermon (2020a) and Song and Ermon (2020b)
6 Score diffusions
See score diffusions for a useful generalization/alternate use of Langevin-like dynamics.
7 Non-Gaussian innovation process
What work is the Guassian distribution do? Can we solve an SDE with a different innovation process that still solves some interesting I don’t have results about this, but see Ikeda and Watanabe (2014) for some quick-and-dirty results which look like they might help.
8 Implementations
Below are some implementations I explored. I do not recommend them necessarily; if it is a simple unadjusted Langevin sampler, the algorithm is simple enough to implement from scratch, which is kind of the point, relative to Hamiltonian MCMC. Langevin samplers are also crucially simple enough that code for them fits easily in the context window of an LLM.
- Stochastic gradient Markov chain Monte Carlo - lyndonduong.com
- alisiahkoohi/Langevin-dynamics: Sampling with gradient-based Markov Chain Monte Carlo approaches
- langevin-monte-carlo/ula.py at master · abdulfatir/langevin-monte-carlo
- PYSGMCMC – Stochastic Gradient Markov Chain Monte Carlo Sampling — pysgmcmc documentation