Neural denoising diffusion models
Denoising diffusion probabilistic models (DDPMs), score-based generative models, generative diffusion processes, neural energy models…
November 10, 2021 — April 22, 2024
Placeholder.
AFAICS, generative models use score-matching to learn and score diffusion] to sample. There are various tricks needed to do it with successive denoising steps and interpretation in terms of diffusion SDEs. I am vaguely aware that this oversimplifies a rich and interesting history of convergence of many useful techniques, but have not invested enough time to claim actual expertise.
1 Sampling: Langevin dynamics
There are a few extra tweaks in diffusion models compared to classic Langevin samplers, which seem to be a slightly specialist sub field. What do we call those? “Score diffusions”?
2 Training: score matching
Modern score matching seems to originate in Hyvärinen (2005). See score matching or McAllester (2023) for an introduction to the general idea.
3 Image generation in particular
See image generation with diffusion.
4 Generalised diffusion
5 Conditioning
There are lots of ways we might try to condition diffusions, differing sometimes only in emphasis.
5.1 Generic conditioning
Rozet and Louppe (2023b) summarises:
With score-based generative models, we can generate samples from the unconditional distribution \(p(x(0)) \approx p(x)\). To solve inverse problems, however, we need to sample from the posterior distribution \(p(x \mid y)\). This could be accomplished by training a conditional score network \(s_\phi(x(t), t \mid y)\) to approximate the posterior score \(\nabla_{x(t)} \log p(x(t) \mid y)\) and plugging it into the reverse SDE (4). However, this would require data pairs \((x, y)\) during training and one would need to retrain a new score network each time the observation process \(p(y \mid x)\) changes. Instead, many have observed (Y. Song, Sohl-Dickstein, et al. 2022; Adam et al. 2022; Chung et al. 2023; Kawar, Vaksman, and Elad 2021; Y. Song, Shen, et al. 2022) that the posterior score can be decomposed into two terms thanks to Bayes’ rule \[ \nabla_{x(t)} \log p(x(t) \mid y)=\nabla_{x(t)} \log p(x(t))+\nabla_{x(t)} \log p(y \mid x(t)) . \]
Since the prior score \(\nabla_{x(t)} \log p(x(t))\) can be approximated with a single score network, the remaining task is to estimate the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\). Assuming a differentiable measurement function \(\mathcal{A}\) and a Gaussian observation process \(p(y \mid x)=\mathcal{N}\left(y \mid \mathcal{A}(x), \Sigma_y\right)\), Chung et al. (2023) propose the approximation \[ p(y \mid x(t))=\int p(y \mid x) p(x \mid x(t)) \mathrm{d} x \approx \mathcal{N}\left(y \mid \mathcal{A}(\hat{x}(x(t))), \Sigma_y\right) \] where the mean \(\hat{x}(x(t))=\mathbb{E}_{p(x \mid x(t))}[x]\) is given by Tweedie’s formula (Efron 2011; Kim and Ye 2021) \[ \begin{aligned} \mathbb{E}_{p(x \mid x(t))}[x] & =\frac{x(t)+\sigma(t)^2 \nabla_{x(t)} \log p(x(t))}{\mu(t)} \\ & \approx \frac{x(t)+\sigma(t)^2 s_\phi(x(t), t)}{\mu(t)} . \end{aligned} \]
As the log-likelihood of a multivariate Gaussian is known analytically and \(s_\phi(x(t), t)\) is differentiable, we can compute the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\) with this approximation in zero-shot, that is, without training any other network than \(s_\phi(x(t), t)\).
5.2 Inpainting
If we want coherence with some chunk of an existing image, we call that inpainting (Ajay et al. 2023; Grechka, Couairon, and Cord 2024; A. Liu, Niepert, and Broeck 2023; Lugmayr et al. 2022; Sharrock et al. 2022; Wu et al. 2023; Zhang et al. 2023). Clearly this is a certain type of conditioning.
5.3 Super-resolution
Coherence, but with a sparse regular subset (Zamir et al. 2021; Choi et al. 2021).
5.4 Reconstruction/inversion
Perturbed and partial observations (Choi et al. 2021; Kawar et al. 2022; Nair, Mei, and Patel 2023; Peng et al. 2024; Xie and Li 2022; Zhao et al. 2023; Y. Song, Shen, et al. 2022; Zamir et al. 2021; Chung et al. 2023; Sui et al. 2024).
6 Latent
6.1 Generic
6.2 CLIP
Radford et al. (2021)
7 Diffusion on weird spaces
Generic: Okhotin et al. (2023).
7.1 PD manifolds
Li et al. (2024)
7.2 Proteins
Baker Lab (Torres et al. 2022; Watson et al. 2022)
8 Shapes
9 Incoming
- Das, Building Diffusion Model’s theory from ground up for ICLR Blogposts 2024
- Lilian Weng, What are Diffusion Models?
- Yang Song, Generative Modelling by Estimating Gradients of the Data Distribution
- Sander Dieleman, Diffusion models are autoencoders
- CVPR tutorial, Denoising Diffusion-based Generative Modelling: Foundations and Applications Accompanying video
- What’s the score? (Review of latest Score Based Generative Modelling papers.)
- Anil Ananthaswamy, The Physics Principle That Inspired Modern AI Art
- The geometry of data: the missing metric tensor and the Stein score [Part II] | Terra Incognita
- Perspectives on diffusion – Sander Dieleman
- Thoughts on Riemannian metrics and its connection with diffusion/score matching [Part I] | Terra Incognita
Suggestive connection to thermodynamics (Sohl-Dickstein et al. 2015).