Neural denoising diffusion models

Denoising diffusion probabilistic models (DDPMs), score-based generative models, generative diffusion processes, neural energy models…

November 11, 2021 — April 22, 2024

Monte Carlo
neural nets
probabilistic algorithms
Figure 1


AFAICS, generative models using score-matching to learn and Langevin MCMC to sample. There are various tricks needed to to do it with successive denoising steps and interpretation in terms of diffusion SDEs. I am vaguely aware that this oversimplifies a rich and interesting history of convergence of many useful techniques, but have not invested enough time to claim actual expertise.

1 Training: score matching

Modern score matching seems to originate in Hyvärinen (2005). See score matching or McAllester (2023) for an introduction to the general idea.

2 Sampling: Langevin dynamics

See Langevin samplers.

3 Image generation in particular

See image generation with diffusion.

Figure 2

4 Conditioning

There are lots of ways we might try to condition diffusions, differing sometimes only in emphasis.

4.1 Generic conditioning

Rozet and Louppe (2023a) summarises:

With score-based generative models, we can generate samples from the unconditional distribution \(p(x(0)) \approx p(x)\). To solve inverse problems, however, we need to sample from the posterior distribution \(p(x \mid y)\). This could be accomplished by training a conditional score network \(s_\phi(x(t), t \mid y)\) to approximate the posterior score \(\nabla_{x(t)} \log p(x(t) \mid y)\) and plugging it into the reverse SDE (4). However, this would require data pairs \((x, y)\) during training and one would need to retrain a new score network each time the observation process \(p(y \mid x)\) changes. Instead, many have observed (Y. Song, Sohl-Dickstein, et al. 2022; Adam et al. 2022; Chung et al. 2023; Kawar, Vaksman, and Elad 2021; Y. Song, Shen, et al. 2022) that the posterior score can be decomposed into two terms thanks to Bayes’ rule \[ \nabla_{x(t)} \log p(x(t) \mid y)=\nabla_{x(t)} \log p(x(t))+\nabla_{x(t)} \log p(y \mid x(t)) . \]

Since the prior score \(\nabla_{x(t)} \log p(x(t))\) can be approximated with a single score network, the remaining task is to estimate the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\). Assuming a differentiable measurement function \(\mathcal{A}\) and a Gaussian observation process \(p(y \mid x)=\mathcal{N}\left(y \mid \mathcal{A}(x), \Sigma_y\right)\), Chung et al. (2023) propose the approximation \[ p(y \mid x(t))=\int p(y \mid x) p(x \mid x(t)) \mathrm{d} x \approx \mathcal{N}\left(y \mid \mathcal{A}(\hat{x}(x(t))), \Sigma_y\right) \] where the mean \(\hat{x}(x(t))=\mathbb{E}_{p(x \mid x(t))}[x]\) is given by Tweedie’s formula (Efron 2011; Kim and Ye 2021) \[ \begin{aligned} \mathbb{E}_{p(x \mid x(t))}[x] & =\frac{x(t)+\sigma(t)^2 \nabla_{x(t)} \log p(x(t))}{\mu(t)} \\ & \approx \frac{x(t)+\sigma(t)^2 s_\phi(x(t), t)}{\mu(t)} . \end{aligned} \]

As the log-likelihood of a multivariate Gaussian is known analytically and \(s_\phi(x(t), t)\) is differentiable, we can compute the likelihood score \(\nabla_{x(t)} \log p(y \mid x(t))\) with this approximation in zero-shot, that is, without training any other network than \(s_\phi(x(t), t)\).

4.2 Inpainting

If we want coherence with some chunk of existing image, we call that inpainting. (Ajay et al. 2023; Grechka, Couairon, and Cord 2024; A. Liu, Niepert, and Broeck 2023; Lugmayr et al. 2022; Sharrock et al. 2022; Wu et al. 2023; Zhang et al. 2023).

4.3 Super-resolution

Coherence, but with a sparse regular subset (Zamir et al. 2021; Choi et al. 2021).

4.4 Reconstruction/inversion

Perturbed and partial observations (Choi et al. 2021; Kawar et al. 2022; Nair, Mei, and Patel 2023; Peng et al. 2024; Xie and Li 2022; Zhao et al. 2023; Y. Song, Shen, et al. 2022; Zamir et al. 2021; Chung et al. 2023; Sui et al. 2024).

5 Latent

5.1 Generic

5.2 CLIP

Radford et al. (2021)

6 Diffusion on weird spaces

Generic: Okhotin et al. (2023).

6.1 PD manifolds

Li et al. (2024)

6.2 Proteins

Baker Lab (Torres et al. 2022; Watson et al. 2022)

7 Shapes

Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions – Princeton Computing Imaging Lab

(Chou, Bahat, and Heide 2023; Shim, Kang, and Joo 2023).

8 Incoming

Suggestive connection to thermodynamics (Sohl-Dickstein et al. 2015).

Figure 3

9 References

Adam, Coogan, Malkin, et al. 2022. Posterior Samples of Source Galaxies in Strong Gravitational Lenses with Score-Based Priors.”
Ajay, Du, Gupta, et al. 2023. Is Conditional Generative Modeling All You Need for Decision-Making? In.
Albergo, Boffi, and Vanden-Eijnden. 2023. Stochastic Interpolants: A Unifying Framework for Flows and Diffusions.”
Albergo, Goldstein, Boffi, et al. 2023. Stochastic Interpolants with Data-Dependent Couplings.”
Albergo, and Vanden-Eijnden. 2023. Building Normalizing Flows with Stochastic Interpolants.” In.
Anderson. 1982. Reverse-Time Diffusion Equation Models.” Stochastic Processes and Their Applications.
Bastek, Sun, and Kochmann. 2024. Physics-Informed Diffusion Models.”
Choi, Kim, Jeong, et al. 2021. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models.” In.
Chou, Bahat, and Heide. 2023. Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions.”
Chung, Kim, Mccann, et al. 2023. Diffusion Posterior Sampling for General Noisy Inverse Problems.” In.
Dhariwal, and Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis.” arXiv:2105.05233 [Cs, Stat].
Dockhorn, Vahdat, and Kreis. 2022. GENIE: Higher-Order Denoising Diffusion Solvers.” In.
Dutordoir, Saul, Ghahramani, et al. 2022. Neural Diffusion Processes.”
Efron. 2011. Tweedie’s Formula and Selection Bias.” Journal of the American Statistical Association.
Graikos, Malkin, Jojic, et al. 2022. Diffusion Models as Plug-and-Play Priors.” Advances in Neural Information Processing Systems.
Grechka, Couairon, and Cord. 2024. GradPaint: Gradient-Guided Inpainting with Diffusion Models.” Computer Vision and Image Understanding.
Guo, Liu, Wang, et al. 2024. Diffusion Models in Bioinformatics and Computational Biology.” Nature Reviews Bioengineering.
Haitsiukevich, Poyraz, Marttinen, et al. 2024. Diffusion Models as Probabilistic Neural Operators for Recovering Unobserved States of Dynamical Systems.”
Han, Zheng, and Zhou. 2022. CARD: Classification and Regression Diffusion Models.”
Ho, Jain, and Abbeel. 2020. Denoising Diffusion Probabilistic Models.” arXiv:2006.11239 [Cs, Stat].
Hoogeboom, Gritsenko, Bastings, et al. 2021. Autoregressive Diffusion Models.” arXiv:2110.02037 [Cs, Stat].
Hyvärinen. 2005. Estimation of Non-Normalized Statistical Models by Score Matching.” The Journal of Machine Learning Research.
Jalal, Arvinte, Daras, et al. 2021. Robust Compressed Sensing MRI with Deep Generative Priors.” In Advances in Neural Information Processing Systems.
Jo, Lee, and Hwang. 2022. Score-Based Generative Modeling of Graphs via the System of Stochastic Differential Equations.” In Proceedings of the 39th International Conference on Machine Learning.
Jolicoeur-Martineau, Piché-Taillefer, Mitliagkas, et al. 2022. Adversarial Score Matching and Improved Sampling for Image Generation.” In.
Kawar, Elad, Ermon, et al. 2022. Denoising Diffusion Restoration Models.” Advances in Neural Information Processing Systems.
Kawar, Vaksman, and Elad. 2021. SNIPS: Solving Noisy Inverse Problems Stochastically.” In.
Kim, and Ye. 2021. Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising Without Clean Images.” In.
Lipman, Chen, Ben-Hamu, et al. 2023. Flow Matching for Generative Modeling.”
Liu, Ziming, Luo, Xu, et al. 2023. GenPhys: From Physical Processes to Generative Models.”
Liu, Anji, Niepert, and Broeck. 2023. Image Inpainting via Tractable Steering of Diffusion Models.”
Liu, Chang, Zhuo, Cheng, et al. 2019. Understanding and Accelerating Particle-Based Variational Inference.” In Proceedings of the 36th International Conference on Machine Learning.
Li, Yu, He, et al. 2024. SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space.” Proceedings of the AAAI Conference on Artificial Intelligence.
Lugmayr, Danelljan, Romero, et al. 2022. RePaint: Inpainting Using Denoising Diffusion Probabilistic Models.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
McAllester. 2023. On the Mathematics of Diffusion Models.”
Nair, Mei, and Patel. 2023. AT-DDPM: Restoring Faces Degraded by Atmospheric Turbulence Using Denoising Diffusion Probabilistic Models.” In.
Nichol, and Dhariwal. 2021. Improved Denoising Diffusion Probabilistic Models.” In Proceedings of the 38th International Conference on Machine Learning.
Okhotin, Molchanov, Vladimir, et al. 2023. Star-Shaped Denoising Diffusion Probabilistic Models.” Advances in Neural Information Processing Systems.
Pang, Mao, He, et al. 2024. An Improved Face Image Restoration Method Based on Denoising Diffusion Probabilistic Models.” IEEE Access.
Pascual, Bhattacharya, Yeh, et al. 2022. Full-Band General Audio Synthesis with Score-Based Diffusion.”
Peng, Qiu, Wynne, et al. 2024. CBCT-Based Synthetic CT Image Generation Using Conditional Denoising Diffusion Probabilistic Model.” Medical Physics.
Preechakul, Chatthee, Wizadwongsa, et al. 2022. Diffusion Autoencoders: Toward a Meaningful and Decodable Representation.” In.
Radford, Kim, Hallacy, et al. 2021. Learning Transferable Visual Models From Natural Language Supervision.”
Rozet, and Louppe. 2023a. Score-Based Data Assimilation.”
———. 2023b. Score-Based Data Assimilation for a Two-Layer Quasi-Geostrophic Model.”
Sharrock, Simons, Liu, et al. 2022. Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models.”
Shim, Kang, and Joo. 2023. Diffusion-Based Signed Distance Fields for 3D Shape Generation.” In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Sohl-Dickstein, Weiss, Maheswaranathan, et al. 2015. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.” arXiv:1503.03585 [Cond-Mat, q-Bio, Stat].
Song, Yang, Durkan, Murray, et al. 2021. Maximum Likelihood Training of Score-Based Diffusion Models.” In Advances in Neural Information Processing Systems.
Song, Yang, and Ermon. 2020a. Generative Modeling by Estimating Gradients of the Data Distribution.” In Advances In Neural Information Processing Systems.
———. 2020b. Improved Techniques for Training Score-Based Generative Models.” In Advances In Neural Information Processing Systems.
Song, Yang, Garg, Shi, et al. 2019. Sliced Score Matching: A Scalable Approach to Density and Score Estimation.”
Song, Jiaming, Meng, and Ermon. 2021. Denoising Diffusion Implicit Models.” arXiv:2010.02502 [Cs].
Song, Yang, Shen, Xing, et al. 2022. Solving Inverse Problems in Medical Imaging with Score-Based Generative Models.” In.
Song, Yang, Sohl-Dickstein, Kingma, et al. 2022. Score-Based Generative Modeling Through Stochastic Differential Equations.” In.
Sui, Ma, Zhang, et al. 2024. Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution.”
Swersky, Ranzato, Buchman, et al. 2011. “On Autoencoders and Score Matching for Energy Based Models.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11).
Torres, Leung, Lutz, et al. 2022. De Novo Design of High-Affinity Protein Binders to Bioactive Helical Peptides.”
Tzen, and Raginsky. 2019a. Theoretical Guarantees for Sampling and Inference in Generative Models with Latent Diffusions.” In Proceedings of the Thirty-Second Conference on Learning Theory.
———. 2019b. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit.”
Vincent. 2011. A connection between score matching and denoising autoencoders.” Neural Computation.
Watson, Juergens, Bennett, et al. 2022. Broadly Applicable and Accurate Protein Design by Integrating Structure Prediction Networks and Diffusion Generative Models.”
Wu, Trippe, Naesseth, et al. 2023. Practical and Asymptotically Exact Conditional Sampling in Diffusion Models.” In.
Xie, and Li. 2022. Measurement-Conditioned Denoising Diffusion Probabilistic Model for Under-Sampled Medical Image Reconstruction.” In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022.
Xu, Yilun, Liu, Tegmark, et al. 2022. Poisson Flow Generative Models.” In Advances in Neural Information Processing Systems.
Xu, Yilun, Liu, Tian, et al. 2023. PFGM++: Unlocking the Potential of Physics-Inspired Generative Models.” In.
Xu, Mengze, Ma, and Zhu. 2023. Dual-Diffusion: Dual Conditional Denoising Diffusion Probabilistic Models for Blind Super-Resolution Reconstruction in RSIs.” IEEE Geoscience and Remote Sensing Letters.
Yang, Zhang, Hong, et al. 2022. Diffusion Models: A Comprehensive Survey of Methods and Applications.”
Yang, Zhang, Song, et al. 2023. Diffusion Models: A Comprehensive Survey of Methods and Applications.” ACM Computing Surveys.
Zamir, Arora, Khan, et al. 2021. Multi-Stage Progressive Image Restoration.”
Zhang, Ji, Zhang, et al. 2023. Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models.” In Proceedings of the 40th International Conference on Machine Learning. ICML’23.
Zhao, Bai, Zhu, et al. 2023. DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion.” In.
Zhuang, Abnar, Gu, et al. 2022. Diffusion Probabilistic Fields.” In.