# Neural denoising diffusion models

Denoising diffusion probabilistic models (DDPMs), score-based generative models, generative diffusion processes, neural energy models…

November 11, 2021 — April 22, 2024

Placeholder.

AFAICS, generative models using score-matching to learn and Langevin MCMC to sample. There are various tricks needed to to do it with successive denoising steps and interpretation in terms of diffusion SDEs. I am vaguely aware that this oversimplifies a rich and interesting history of convergence of many useful techniques, but have not invested enough time to claim actual expertise.

## 1 Training: score matching

Modern score matching seems to originate in Hyvärinen (2005). See score matching or McAllester (2023) for an introduction to the general idea.

## 2 Sampling: Langevin dynamics

See Langevin samplers.

## 3 Image generation in particular

See image generation with diffusion.

## 4 Conditioning

There are lots of ways we might try to condition diffusions, differing sometimes only in emphasis.

### 4.1 Generic conditioning

Rozet and Louppe (2023a) summarises:

With score-based generative models, we can generate samples from the unconditional distribution \(p(x(0)) \approx p(x)\). To solve inverse problems, however, we need to sample from the posterior distribution \(p(x \mid y)\). This could be accomplished by training a conditional score network \(s_\phi(x(t), t \mid y)\) to approximate the posterior score \(\nabla_{x(t)} \log p(x(t) \mid y)\) and plugging it into the reverse SDE (4). However, this would require data pairs \((x, y)\) during training and one would need to retrain a new score network each time the observation process \(p(y \mid x)\) changes. Instead, many have observed (Y. Song, Sohl-Dickstein, et al. 2022; Adam et al. 2022; Chung et al. 2023; Kawar, Vaksman, and Elad 2021; Y. Song, Shen, et al. 2022) that the posterior score can be decomposed into two terms thanks to Bayes’ rule \[ \nabla_{x(t)} \log p(x(t) \mid y)=\nabla_{x(t)} \log p(x(t))+\nabla_{x(t)} \log p(y \mid x(t)) . \]

Since the prior score \(\nabla_{x(t)} \log p(x(t))\) can be approximated with a single score network, the remaining task is to estimate the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\). Assuming a differentiable measurement function \(\mathcal{A}\) and a Gaussian observation process \(p(y \mid x)=\mathcal{N}\left(y \mid \mathcal{A}(x), \Sigma_y\right)\), Chung et al. (2023) propose the approximation \[ p(y \mid x(t))=\int p(y \mid x) p(x \mid x(t)) \mathrm{d} x \approx \mathcal{N}\left(y \mid \mathcal{A}(\hat{x}(x(t))), \Sigma_y\right) \] where the mean \(\hat{x}(x(t))=\mathbb{E}_{p(x \mid x(t))}[x]\) is given by Tweedie’s formula (Efron 2011; Kim and Ye 2021) \[ \begin{aligned} \mathbb{E}_{p(x \mid x(t))}[x] & =\frac{x(t)+\sigma(t)^2 \nabla_{x(t)} \log p(x(t))}{\mu(t)} \\ & \approx \frac{x(t)+\sigma(t)^2 s_\phi(x(t), t)}{\mu(t)} . \end{aligned} \]

As the log-likelihood of a multivariate Gaussian is known analytically and \(s_\phi(x(t), t)\) is differentiable, we can compute the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\) with this approximation in zero-shot, that is, without training any other network than \(s_\phi(x(t), t)\).

### 4.2 Inpainting

If we want coherence with some chunk of existing image, we call that *inpainting*. (Ajay et al. 2023; Grechka, Couairon, and Cord 2024; A. Liu, Niepert, and Broeck 2023; Lugmayr et al. 2022; Sharrock et al. 2022; Wu et al. 2023; Zhang et al. 2023).

### 4.3 Super-resolution

Coherence, but with a sparse regular subset (Zamir et al. 2021; Choi et al. 2021).

### 4.4 Reconstruction/inversion

Perturbed and partial observations (Choi et al. 2021; Kawar et al. 2022; Nair, Mei, and Patel 2023; Peng et al. 2024; Xie and Li 2022; Zhao et al. 2023; Y. Song, Shen, et al. 2022; Zamir et al. 2021; Chung et al. 2023; Sui et al. 2024).

## 5 Latent

### 5.1 Generic

### 5.2 CLIP

Radford et al. (2021)

## 6 Diffusion on weird spaces

Generic: Okhotin et al. (2023).

### 6.1 PD manifolds

Li et al. (2024)

### 6.2 Proteins

Baker Lab (Torres et al. 2022; Watson et al. 2022)

## 7 Shapes

## 8 Incoming

- Lilian Weng, What are Diffusion Models?
- Yang Song, Generative Modeling by Estimating Gradients of the Data Distribution
- Sander Dieleman, Diffusion models are autoencoders
- CVPR tutorial, Denoising Diffusion-based Generative Modeling: Foundations and Applications Accompanying video
- What’s the score? (Review of latest Score Based Generative Modeling papers.)
- Anil Ananthaswamy, The Physics Principle That Inspired Modern AI Art

Suggestive connection to thermodynamics (Sohl-Dickstein et al. 2015).

## 9 References

*Stochastic Processes and Their Applications*.

*arXiv:2105.05233 [Cs, Stat]*.

*Journal of the American Statistical Association*.

*Advances in Neural Information Processing Systems*.

*Computer Vision and Image Understanding*.

*Nature Reviews Bioengineering*.

*arXiv:2006.11239 [Cs, Stat]*.

*arXiv:2110.02037 [Cs, Stat]*.

*The Journal of Machine Learning Research*.

*Advances in Neural Information Processing Systems*.

*Proceedings of the 39th International Conference on Machine Learning*.

*Advances in Neural Information Processing Systems*.

*Proceedings of the 36th International Conference on Machine Learning*.

*Proceedings of the AAAI Conference on Artificial Intelligence*.

*2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*.

*Proceedings of the 38th International Conference on Machine Learning*.

*Advances in Neural Information Processing Systems*.

*IEEE Access*.

*Medical Physics*.

*2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*.

*arXiv:1503.03585 [Cond-Mat, q-Bio, Stat]*.

*Advances in Neural Information Processing Systems*.

*Advances In Neural Information Processing Systems*.

*Advances In Neural Information Processing Systems*.

*arXiv:2010.02502 [Cs]*.

*Proceedings of the 28th International Conference on Machine Learning (ICML-11)*.

*Proceedings of the Thirty-Second Conference on Learning Theory*.

*Neural Computation*.

*Medical Image Computing and Computer Assisted Intervention – MICCAI 2022*.

*Advances in Neural Information Processing Systems*.

*IEEE Geoscience and Remote Sensing Letters*.

*ACM Computing Surveys*.

*Proceedings of the 40th International Conference on Machine Learning*. ICML’23.