Neural denoising diffusion models
Denoising diffusion probabilistic models (DDPMs), score-based generative models, generative diffusion processes, neural energy models…
November 11, 2021 — April 22, 2024
Placeholder.
AFAICS, generative models using score-matching to learn and Langevin MCMC to sample. There are various tricks needed to do it with successive denoising steps and interpretation in terms of diffusion SDEs. I am vaguely aware that this oversimplifies a rich and interesting history of convergence of many useful techniques, but have not invested enough time to claim actual expertise.
1 Training: score matching
Modern score matching seems to originate in Hyvärinen (2005). See score matching or McAllester (2023) for an introduction to the general idea.
2 Sampling: Langevin dynamics
See Langevin samplers.
3 Image generation in particular
See image generation with diffusion.
4 Generalised diffusion
5 Conditioning
There are lots of ways we might try to condition diffusions, differing sometimes only in emphasis.
5.1 Generic conditioning
Rozet and Louppe (2023a) summarises:
With score-based generative models, we can generate samples from the unconditional distribution \(p(x(0)) \approx p(x)\). To solve inverse problems, however, we need to sample from the posterior distribution \(p(x \mid y)\). This could be accomplished by training a conditional score network \(s_\phi(x(t), t \mid y)\) to approximate the posterior score \(\nabla_{x(t)} \log p(x(t) \mid y)\) and plugging it into the reverse SDE (4). However, this would require data pairs \((x, y)\) during training and one would need to retrain a new score network each time the observation process \(p(y \mid x)\) changes. Instead, many have observed (Y. Song, Sohl-Dickstein, et al. 2022; Adam et al. 2022; Chung et al. 2023; Kawar, Vaksman, and Elad 2021; Y. Song, Shen, et al. 2022) that the posterior score can be decomposed into two terms thanks to Bayes’ rule \[ \nabla_{x(t)} \log p(x(t) \mid y)=\nabla_{x(t)} \log p(x(t))+\nabla_{x(t)} \log p(y \mid x(t)) . \]
Since the prior score \(\nabla_{x(t)} \log p(x(t))\) can be approximated with a single score network, the remaining task is to estimate the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\). Assuming a differentiable measurement function \(\mathcal{A}\) and a Gaussian observation process \(p(y \mid x)=\mathcal{N}\left(y \mid \mathcal{A}(x), \Sigma_y\right)\), Chung et al. (2023) propose the approximation \[ p(y \mid x(t))=\int p(y \mid x) p(x \mid x(t)) \mathrm{d} x \approx \mathcal{N}\left(y \mid \mathcal{A}(\hat{x}(x(t))), \Sigma_y\right) \] where the mean \(\hat{x}(x(t))=\mathbb{E}_{p(x \mid x(t))}[x]\) is given by Tweedie’s formula (Efron 2011; Kim and Ye 2021) \[ \begin{aligned} \mathbb{E}_{p(x \mid x(t))}[x] & =\frac{x(t)+\sigma(t)^2 \nabla_{x(t)} \log p(x(t))}{\mu(t)} \\ & \approx \frac{x(t)+\sigma(t)^2 s_\phi(x(t), t)}{\mu(t)} . \end{aligned} \]
As the log-likelihood of a multivariate Gaussian is known analytically and \(s_\phi(x(t), t)\) is differentiable, we can compute the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\) with this approximation in zero-shot, that is, without training any other network than \(s_\phi(x(t), t)\).
5.2 Inpainting
If we want coherence with some chunk of existing image, we call that inpainting. (Ajay et al. 2023; Grechka, Couairon, and Cord 2024; A. Liu, Niepert, and Broeck 2023; Lugmayr et al. 2022; Sharrock et al. 2022; Wu et al. 2023; Zhang et al. 2023).
5.3 Super-resolution
Coherence, but with a sparse regular subset (Zamir et al. 2021; Choi et al. 2021).
5.4 Reconstruction/inversion
Perturbed and partial observations (Choi et al. 2021; Kawar et al. 2022; Nair, Mei, and Patel 2023; Peng et al. 2024; Xie and Li 2022; Zhao et al. 2023; Y. Song, Shen, et al. 2022; Zamir et al. 2021; Chung et al. 2023; Sui et al. 2024).
6 Latent
6.1 Generic
6.2 CLIP
Radford et al. (2021)
7 Diffusion on weird spaces
Generic: Okhotin et al. (2023).
7.1 PD manifolds
Li et al. (2024)
7.2 Proteins
Baker Lab (Torres et al. 2022; Watson et al. 2022)
8 Shapes
9 Incoming
- Lilian Weng, What are Diffusion Models?
- Yang Song, Generative Modeling by Estimating Gradients of the Data Distribution
- Sander Dieleman, Diffusion models are autoencoders
- CVPR tutorial, Denoising Diffusion-based Generative Modeling: Foundations and Applications Accompanying video
- What’s the score? (Review of latest Score Based Generative Modeling papers.)
- Anil Ananthaswamy, The Physics Principle That Inspired Modern AI Art
Suggestive connection to thermodynamics (Sohl-Dickstein et al. 2015).