Langevin dynamcs MCMC

August 16, 2020 — November 15, 2024

Bayes
estimator distribution
functional analysis
Markov processes
Monte Carlo
neural nets
optimization
probabilistic algorithms
probability
SDEs
stochastic processes
Figure 1: Randomly exploring the posterior space.

Sampling using the particular SDE that is the Langevin equation. It can be rather close to optimising, via SDE representations of SGD.

1 Langevin dynamics

The continuous-time Langevin equation is

\[ d\mathbf{x}_t = -\nabla U(\mathbf{x}_t) \, dt + \sqrt{2} \, d\mathbf{W}_t \]

where

  • \(U(\mathbf{x})\) is the potential function related to the target distribution \(p(\mathbf{x})\) via \(p(\mathbf{x}) \propto e^{-U(\mathbf{x})}\).
  • \(d\mathbf{W}_t\) represents the increment of a Wiener process (standard Brownian motion).

We care about this because it suggests a sample that is

  1. very simple,
  2. only needs to access to the score function, which is less computationally inconvenient when we can get away with it.

We typically discretize the Langevin equation using the Euler-Maruyama method with time step \(\epsilon\)

\[ \mathbf{x}_{k+1} = \mathbf{x}_k - \epsilon \nabla U(\mathbf{x}_k) + \sqrt{2\epsilon} \, \boldsymbol{\eta}_k \]

where \(\boldsymbol{\eta}_k \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_D)\). There are more advanced discretisations; this is the entry-leve.

See also Bayesian inference for a logistic regression model Part 4: Gradients and the Langevin algorithm

2 Metropolis-adjusted (MALA)

3 log-concave

In log-concave distributions it end sup working well to solve the SDE implicitly (Hodgkinson, Salomone, and Roosta 2019) which is powerful when you can do it.

Holden Lee, Andrej Risteski introduce the connection between log-concavity and convex optimisation.

4 Via bridges

Left-field, Max Raginsky, Sampling Using Diffusion Processes, from Langevin to Schrödinger:

the Langevin process gives only approximate samples from \(\mu\). I would like to discuss an alternative approach that uses diffusion processes to obtain exact samples in finite time. This approach is based on ideas that appeared in two papers from the 1930s by Erwin Schrödinger in the context of physics, and is now referred to as the Schrödinger bridge problem.

5 Annealed

TBC

See Jolicoeur-Martineau et al. (2022), Song and Ermon (2020a) and Song and Ermon (2020b)

6 Score diffusions

See score diffusions for a useful generalization/alternate use of Langevin-like dynamics.

7 Non-Gaussian innovation process

What work is the Guassian distribution do? Can we solve an SDE with a different innovation process that still solves some interesting I don’t have results about this, but see Ikeda and Watanabe (2014) for some quick-and-dirty results which look like they might help.

8 Implementations

Below are some implementations I explored. I do not recommend them necessarily; if it is a simple unadjusted Langevin sampler, the algorithm is simple enough to implement from scratch, which is kind of the point, relative to Hamiltonian MCMC. Langevin samplers are also crucially simple enough that code for them fits easily in the context window of an LLM.

9 References

Ahn, Korattikara, and Welling. 2012. Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.
Alexos, Boyd, and Mandt. 2022. Structured Stochastic Gradient MCMC.” In Proceedings of the 39th International Conference on Machine Learning.
Bakker, Post, Langevin, et al. 2016. Scripting MODFLOW Model Development Using Python and FloPy.” Groundwater.
Beskos, Roberts, and Stuart. 2009. Optimal Scalings for Local Metropolis–Hastings Chains on Nonproduct Targets in High Dimensions.” The Annals of Applied Probability.
Brosse, Moulines, and Durmus. 2018. The Promises and Pitfalls of Stochastic Gradient Langevin Dynamics.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Chen, Fox, and Guestrin. 2014. Stochastic Gradient Hamiltonian Monte Carlo.” In Proceedings of the 31st International Conference on Machine Learning.
Dalalyan. 2017. Further and Stronger Analogy Between Sampling and Optimization: Langevin Monte Carlo and Gradient Descent.” arXiv:1704.04752 [Math, Stat].
Ding, Fang, Babbush, et al. 2014. Bayesian Sampling Using Stochastic Gradient Thermostats.” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14.
Dockhorn, Vahdat, and Kreis. 2022. Score-Based Generative Modeling with Critically-Damped Langevin Diffusion.”
Domke. 2017. A Divergence Bound for Hybrids of MCMC and Variational Inference and an Application to Langevin Dynamics and SGVI.” In PMLR.
Durmus, and Moulines. 2016. High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [Math, Stat].
Filippone, and Engler. 2015. Enabling Scalable Stochastic Gradient-Based Inference for Gaussian Processes by Employing the Unbiased LInear System SolvEr (ULISSE).” In Proceedings of the 32nd International Conference on Machine Learning.
Garbuno-Inigo, Hoffmann, Li, et al. 2020. Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.” SIAM Journal on Applied Dynamical Systems.
Ge, Lee, and Risteski. 2020. Simulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.” arXiv:1812.00793 [Cs, Math, Stat].
Girolami, and Calderhead. 2011. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Grenander, and Miller. 1994. Representations of Knowledge in Complex Systems.” Journal of the Royal Statistical Society: Series B (Methodological).
Guo, Tao, and Chen. 2024. Provable Benefit of Annealed Langevin Monte Carlo for Non-Log-Concave Sampling.”
Gürbüzbalaban, Gao, Hu, et al. 2021. Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo.” Journal of Machine Learning Research.
Hairer, Stuart, Voss, et al. 2005. Analysis of SPDEs Arising in Path Sampling. Part I: The Gaussian Case.” Communications in Mathematical Sciences.
Hairer, Stuart, and Voss. 2007. Analysis of SPDEs Arising in Path Sampling Part II: The Nonlinear Case.” The Annals of Applied Probability.
Hodgkinson, Salomone, and Roosta. 2019. Implicit Langevin Algorithms for Sampling From Log-Concave Densities.” arXiv:1903.12322 [Cs, Stat].
Ho, Jain, and Abbeel. 2020. Denoising Diffusion Probabilistic Models.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.
Holzschuh, Vegetti, and Thuerey. 2022. “Score Matching via Differentiable Physics.”
Ikeda, and Watanabe. 2014. Stochastic Differential Equations and Diffusion Processes. North-Holland Mathematical Library, v.v.
Jolicoeur-Martineau, Piché-Taillefer, Mitliagkas, et al. 2022. Adversarial Score Matching and Improved Sampling for Image Generation.” In.
Lim, Kovachki, Baptista, et al. 2023. Score-Based Diffusion Models in Function Space.”
Liu, Zhuo, Cheng, et al. 2019. Understanding and Accelerating Particle-Based Variational Inference.” In Proceedings of the 36th International Conference on Machine Learning.
Mandt, Hoffman, and Blei. 2017. Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR.
Ma, Wang, Kim, et al. 2021. Transfer Learning of Memory Kernels for Transferable Coarse-Graining of Polymer Dynamics.” Soft Matter.
Murray, and Ghahramani. 2004. Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms.” In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04.
Norton, and Fox. 2016. Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat].
Parisi. 1981. Correlation Functions and Computer Simulations.” Nuclear Physics B.
Pavliotis. 2014. Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations. Texts in Applied Mathematics.
Rásonyi, and Tikosi. 2022. On the Stability of the Stochastic Gradient Langevin Algorithm with Dependent Data Stream.” Statistics & Probability Letters.
Roberts, and Rosenthal. 1998. Optimal Scaling of Discrete Approximations to Langevin Diffusions.” Journal of the Royal Statistical Society. Series B (Statistical Methodology).
Roberts, and Tweedie. 1996. Exponential Convergence of Langevin Distributions and Their Discrete Approximations.” Bernoulli.
Seifert. 2012. Stochastic Thermodynamics, Fluctuation Theorems and Molecular Machines.” Reports on Progress in Physics.
Shang, Zhu, Leimkuhler, et al. 2015. Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling.” In Advances in Neural Information Processing Systems. NIPS’15.
Song, and Ermon. 2020a. Generative Modeling by Estimating Gradients of the Data Distribution.” In Advances In Neural Information Processing Systems.
———. 2020b. Improved Techniques for Training Score-Based Generative Models.” In Advances In Neural Information Processing Systems.
Song, Sohl-Dickstein, Kingma, et al. 2022. Score-Based Generative Modeling Through Stochastic Differential Equations.” In.
Welling, and Teh. 2011. Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11.
Xifara, Sherlock, Livingstone, et al. 2014. Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters.
Zhang, Ruqi, Liu, and Liu. 2022. A Langevin-Like Sampler for Discrete Distributions.” In Proceedings of the 39th International Conference on Machine Learning.
Zhang, Jianyi, Zhang, Carin, et al. 2020. Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory.” In International Conference on Artificial Intelligence and Statistics.