Conditioning neural denoising diffusion models

Generative modes that match the observations, not the training data

2022-11-30 — 2025-07-23

approximation

Bayes

generative

Monte Carlo

neural nets

optimization

probabilistic algorithms

probability

score function

statistics

Suspiciously similar content

With neural diffusion models, we can generate samples from the unconditional distribution $p (x_{τ_{0}}) \approx p (x)$ . To solve inverse problems, however, we need to sample from the posterior $p (x_{τ_{0}} ∣ y)$ .

There are lots of ways we might try to condition, differing sometimes only in emphasis.

Writing this was a great learning experience, but if you want to learn how to condition diffusions, you can speed-run by simply checking out the review article in Du et al. (2023).

1 Notation

First, let us fix notation. I’ll use a slight variant of the notation from the denoising diffusion SDE notebook. Because I need $t$ for other things, we’ll talk about $τ$ for the pseudo-time grid, and use $τ_{0} = 0 < τ_{1} < \dots < τ_{T} = 1$ to index the discrete pseudo-time grid. We write $x_{τ_{i}}$ for the state at time $τ_{i}$ .

For simplicity, we’ll assume a variance-preserving (VP) diffusion SDE. We corrupt data $x_{τ_{0}} = x_{0} \sim p_{d a t a} (x)$ by the variance-preserving SDE

$d x_{τ} = - \frac{1}{2} β (τ) x_{τ} d τ + \sqrt{β (τ)} d W_{τ},$

or in discrete form for each step $τ_{i} \to τ_{i + 1}$ :

$x_{τ_{i + 1}} = \sqrt{1 - β_{i + 1}} x_{τ_{i}} + \sqrt{β_{i + 1}} ε, ε \sim N (0, I) .$ Write $Δ τ := τ_{i} - τ_{i - 1}$ (uniform unless stated otherwise).

We also define the convenience terms $\bar{α} (τ) = \exp (- \int_{0}^{τ} β (s) d s), so σ (τ)^{2} = 1 - \bar{α} (τ) .$ and $β_{i} := 1 - \frac{\bar{α} (τ_{i})}{\bar{α} (τ_{i - 1})} .$

1.1 Score Network & Training

We train $s_{θ} (x, τ)$ to approximate the time-indexed score $\nabla_{x_{τ}} \log p_{τ} (x_{τ})$ by minimizing the denoising loss

$L (θ) = E ‖ \nabla_{x_{τ}} \log p_{τ} (x_{τ}) - s_{θ} (x_{τ}, τ) ‖^{2} .$

where $x_{τ} = \sqrt{\bar{α} (τ)} x_{0} + \sqrt{1 - \bar{α} (τ)} ε$ .

noise vs score

Equivalently, we can parametrize $s_{θ}$ directly to predict the noise. In a VP diffusion the conditional distribution of the noise $ε$ given the noisy point $x_{τ}$ is Gaussian with mean $\begin{aligned} E [ε ∣ x_{τ}] & = - \frac{σ (τ)}{\sqrt{\bar{α} (τ)}} \nabla_{x_{τ}} \log p_{τ} (x_{τ}) \\ \nabla_{x_{τ}} \log p_{τ} (x_{τ}) & = - \frac{\sqrt{\bar{α} (τ)}}{σ (τ)} ε . \end{aligned}$ Hence predicting the noise (the “ $ε_{θ}$ ” parametrisation) or predicting the score (the “ $s_{θ}$ ” parametrisation) carries the same information up to the known factor $\sqrt{\bar{α} (τ)} / σ (τ) = σ (τ)$ .

1.2 Reverse-Time Sampling

The reverse SDE is as follows. Integrate the reverse SDE from $τ_{T} = 1$ down to $τ_{0} = 0$

$d x_{τ} = [- \frac{1}{2} β (τ) x_{τ} - β (τ) \nabla_{x_{τ}} \log p_{τ} (x_{τ})] d τ + \sqrt{β (τ)} d {\bar{W}}_{τ},$ using $s_{θ} (x, τ) \approx \nabla \log p_{τ} (x)$ . ( ${\bar{W}}_{τ}$ is an independent time reversal of the Wiener process $W_{τ}$ .)

Alternatively, we can use the deterministic / probability-flow ODE:

$d x_{τ} = [- \frac{1}{2} β (τ) x_{τ} - β (τ) \nabla_{x_{τ}} \log p_{τ} (x_{τ})] d τ,$

with initial draw $x_{_T}N(0,I). This yields the same marginals $p_{τ}$ without injecting extra noise at each step (which is insane and not at all obvious to me).

On our $τ$ grid, the discrete time DDPM reverse update becomes, for $i = T, \dots, 1$ :

$x_{τ_{i - 1}} = \frac{1}{\sqrt{1 - β_{i}}} (x_{τ_{i}} - β_{i} s_{θ} (x_{τ_{i}}, τ_{i})) + \sqrt{{\tilde{β}}_{i}} ζ, ζ \sim N (0, I) .$ We introduced here ${\tilde{β}}_{i} := β_{i} (\frac{1}{\bar{α} (τ_{i - 1})} - \frac{1}{\bar{α} (τ_{i})})$ (the “posterior” variance).

The DDIM variant removes $ζ$ for a deterministic two-step inversion.

2 Generic conditioning

Here is a quick rewrite of Rozet and Louppe (2023b). Note I have updated the notation to match the rest of this notebook.

We could train a conditional score network $s_{ϕ} (x_{τ_{i}}, τ_{i} ∣ y)$ to approximate the posterior score $\nabla_{x_{τ_{i}}} \log p (x_{τ_{i}} ∣ y)$ and plug it into the reverse SDE. But this requires $(x, y)$ pairs during training and re‐training whenever the observation model $p (y ∣ x)$ changes.

Instead, many have observed (Song, Sohl-Dickstein, et al. 2022; Adam et al. 2022; Chung et al. 2023; Kawar, Vaksman, and Elad 2021; Song, Shen, et al. 2022) that by Bayes’ rule the posterior score decomposes as $\nabla_{x_{τ_{i}}} \log p (x_{τ_{i}} ∣ y) = \nabla_{x_{τ_{i}}} \log p (x_{τ_{i}}) + \nabla_{x_{τ_{i}}} \log p (y ∣ x_{τ_{i}}) .$ Since the prior score $\nabla_{x_{τ_{i}}} \log p (x_{τ_{i}})$ is well‐approximated by a single score network $s_{ϕ} (x_{τ_{i}}, τ_{i})$ , the remaining task is to estimate the likelihood score $\nabla_{x_{τ_{i}}} \log p (y ∣ x_{τ_{i}})$ .

Assuming a differentiable measurement operator $A$ and Gaussian observations $p (y ∣ x) = N (y ∣ A (x), Σ_{y})$ , Chung et al. (2023) propose approximating $p (y ∣ x_{τ_{i}}) = \int p (y ∣ x) p (x ∣ x_{τ_{i}}) d x \approx N (y ∣ A (\hat{x} (x_{τ_{i}})), Σ_{y}),$ where the denoised mean $\hat{x} (x_{τ_{i}}) = E [x ∣ x_{τ_{i}}]$ is given by Tweedie’s formula (Efron 2011; Kim and Ye 2021): $\begin{aligned} E [x ∣ x_{τ_{i}}] & = E [x_{0} ∣ x_{τ}] = \frac{x_{τ} - \sqrt{1 - \bar{α} (τ)} s_{θ} (x_{τ}, τ)}{\sqrt{\bar{α} (τ)}} \end{aligned}$ Because the log‐likelihood of a multivariate Gaussian is analytic and $s_{ϕ} (x_{τ_{i}}, τ_{i})$ is differentiable, we can compute $\nabla_{x_{τ_{i}}} \log p (y ∣ x_{τ_{i}})$ in a zero‐shot fashion—without training any network beyond the unconditional score model $s_{ϕ}$ .

Note that this last assumption is strong; probably too strong for the models I would bother using diffusions on. Don’t worry we can get fancier and more effective.

3 Ensemble Score Conditioning

A simple trick that sometimes works (F. Bao, Zhang, and Zhang 2024a) but is biased. TBC.

4 Sequential Monte Carlo

This seems to be SOTA?

LLM-aided summary of Wu et al. (2024):

We recall standard SMC / Particle Filtering:

Goal: sample from a sequence of distributions ${ν_{i}}_{i = 0}^{T}$ , ending in some target $ν_{0}$ .
Particles: maintain $K$ samples (particles) ${x_{i}^{k}}_{k = 1}^{K}$ with weights ${w_{i}^{k}}$ .
Iterate for $i = T \to 0$ :
- Resample particles according to $w_{i + 1}^{k}$ to focus on high-probability regions.
- Propose $x_{i}^{k} \sim r_{i} (x_{i} ∣ x_{i + 1}^{k})$ .
- Weight each by
  
  $w_{i}^{k} = \underset{importance weight}{\underset{⏟}{\frac{target density at x_{i}^{k}}{proposal density at x_{i}^{k}}}} .$
Convergence: as $K \to \infty$ , the weighted ensemble approximates the true target exactly.

In a diffusion model, we can view the reverse noising chain

$p_{θ} (x_{0 : T}) = p (x_{T}) \prod_{i = 1}^{T} p_{θ} (x_{τ_{i - 1}} ∣ x_{τ_{i}})$

as exactly such a sequential model over $x_{τ_{T}} \to x_{0}$ , where $ν_{i}$ is the joint $p_{θ} (x_{0}, \dots, x_{1})$ marginalized forward to $τ_{i}$ .

To sample conditionally $p_{θ} (x_{0} ∣ y)$ , we treat the conditioning as part of the final target and apply SMC. However, if we naïvely run SMC with the unconditional transition kernels

$r_{i} (x_{τ_{i - 1}} ∣ x_{τ_{i}}) = p_{θ} (x_{τ_{i - 1}} ∣ x_{τ_{i}})$

and only tack on a final weight $w_{0} \propto p (y ∣ x_{0})$ , we need an astronomical number of particles, since most will get near-zero weight whenever the prior $p_{θ} (x_{0})$ is substantially unlikely compared to the conditional $p_{θ} (x_{0} ∣ y)$ .

Twisting is a classic SMC technique which solves this problem, introducing a sequence of auxiliary functions ${{\tilde{p}}_{θ} (y ∣ x_{τ_{i}})}_{i = 0}^{T}$ to re-weight proposals at every time-step, not just at the end. The ideal optimal choice at step $i$ would be

$r_{i}^{*} (x_{τ_{i - 1}} ∣ x_{τ_{i}}) \propto p_{θ} (x_{τ_{i - 1}} ∣ x_{τ_{i}}) p_{θ} (y ∣ x_{τ_{i - 1}}),$

which—if we could sample it—would make SMC exact with a single particle. However, $p_{θ} (y ∣ x_{τ_{i - 1}})$ is itself intractable.

TDS replaces the optimal twisting $p_{θ} (y ∣ x_{i})$ with a tractable surrogate based on the denoising network ${\hat{x}}_{0} (x_{i})$ , which estimates the denoised $x_{0}$ from the noisy $x_{i}$ .

${\tilde{p}}_{θ} (y ∣ x_{τ_{i}}) = p (y ∣ {\hat{x}}_{0} (x_{τ_{i}})),$

i.e. we evaluate the observation-likelihood at the diffusion denoiser’s one-step posterior mean estimate ${\hat{x}}_{0}$ . Since ${\hat{x}}_{0} (x_{τ}) \approx E [x_{0} ∣ x_{τ}]$ , this becomes increasingly accurate as $τ \to 0$ . Define $σ_{i}^{2} := σ (τ_{i})^{2} = 1 - \bar{α} (τ_{i}) .$

Twisted proposal from $τ_{i} \to τ_{i - 1}$ :

${\tilde{r}}_{i} (x_{τ_{i - 1}} ∣ x_{τ_{i}}, y) = N (x_{τ_{i - 1}}; \underset{mean}{\underset{⏟}{x_{τ_{i}} + \underset{“guided” drift}{\underset{⏟}{σ_{i}^{2} s_{i} (x_{τ_{i}}, y)}}}}, σ_{i}^{2} I),$

where

$s_{i} (x_{τ_{i}}, y) = s_{θ} (x_{τ_{i}}, τ_{i}) + \nabla_{x_{τ_{i}}} \log {\tilde{p}}_{θ} (y ∣ x_{τ_{i}}) .$
Twisted weight for each particle:

$w_{τ_{i - 1}} = \frac{p_{θ} (x_{τ_{i - 1}} ∣ x_{τ_{i}}) {\tilde{p}}_{θ} (y ∣ x_{τ_{i - 1}})}{{\tilde{p}}_{θ} (y ∣ x_{τ_{i}}) {\tilde{r}}_{i} (x_{τ_{i - 1}} ∣ x_{τ_{i}}, y)} .$
Twisted Sister:

This corrects for using the surrogate twisting and ensures asymptotic exactness as $K \to \infty$ .

In early steps ( $i \approx T$ ) the surrogate ${\tilde{p}}_{θ} (y ∣ x_{τ_{i}})$ may be very broad — twisting is mild. Then, in late steps ( $i \to 0$ ) ${\hat{x}}_{0} (x_{τ_{i}})$ is accurate, so ${\tilde{p}}_{θ} (y ∣ x_{τ_{i}}) \approx p_{θ} (y ∣ x_{τ_{i}})$ and the proposals are nearly optimal. Resampling in between keeps the particle cloud focused on regions consistent with both the diffusion prior and the conditioning $y$ .

In practice, we often need a surprisingly small number of particles; even 2–8 particles often suffice to outperform heuristic conditional samplers (like plain classifier guidance or “replacement” inpainting).

5 (Conditional) Schrödinger Bridge

Shi et al. (2022) introduced the Conditional Schrödinger Bridge (CSB), which is a natural extension of the Schrödinger Bridge.

We seek a path-measure $π^{*} (x_{0 : T} ∣ y)$ minimising

$KL (π (\cdot ∣ y) ∥ p (x_{0 : T}))$

subject to

Start at $τ_{T}$ : $π_{τ_{T}} (x_{T} ∣ y) = N (x_{T}; 0, I)$ ,
End at $τ_{0}$ : $π_{τ_{0}} (x_{0} ∣ y) = p (x_{0} ∣ y)$ .

Here $p (x_{0 : T}) = p (x_{T}) \prod_{i} p (x_{τ_{i - 1}} ∣ x_{τ_{i}})$ is the unconditional forward noising chain.

5.1 Amortized IPF Algorithm

We parameterize two families of drift networks that take $y$ as input:

$B_{i}^{n} (x, y) and F_{i}^{n} (x, y) for i = 1, \dots, T, n = 0, \dots, L .$

We alternate two KL-projection steps:

Backward half-step ( $τ_{T} \to τ_{0}$ , enforce prior):

$π^{2 n + 1} (\cdot ∣ y) = \arg min_{π} KL (π (\cdot ∣ y) ∥ π^{2 n} (\cdot ∣ y)) s.t. π_{τ_{T}} (x_{T} ∣ y) = N (0, I) .$

Fit $B_{i}^{n + 1} (x, y)$ by matching the backward SDE induced by $π^{2 n + 1}$ .
Forward half-step ( $τ_{0} \to τ_{T}$ , enforce posterior):

$π^{2 n + 2} (\cdot ∣ y) = \arg min_{π} KL (π (\cdot ∣ y) ∥ π^{2 n + 1} (\cdot ∣ y)) s.t. π_{τ_{0}} (x_{0} ∣ y) = p (x_{0} ∣ y) .$

Fit $F_{i}^{n + 1} (x, y)$ by matching the forward SDE induced by $π^{2 n + 2}$ .

After $L$ IPF iterations we obtain networks $B_{i}^{L} (x, y), F_{i}^{L} (x, y)$ whose composed bridge $π^{*} (\cdot ∣ y)$ exactly matches both endpoints for any $y$ .

We can imagine that the backward step “pins” the Gaussian prior; the forward step “pins” the conditional $p (x_{0} ∣ y)$ . We now define the composed conditional bridge as the concatenation of both of the aforeπiclrmentioned:

Initial draw at $τ_{T}$ :

$x_{τ_{T}} \sim p (x_{τ_{T}}) = N (0, I) .$
Backward transitions for $i = T, T - 1, \dots, 1$ :

$x_{τ_{i - 1}} \sim \underset{π_{back, i}^{*} (x_{τ_{i - 1}} ∣ x_{τ_{i}}, y)}{\underset{⏟}{N (x_{τ_{i}} + Δ τ B_{i}^{L} (x_{τ_{i}}, y), Δ τ I)}} .$
Forward transitions for $i = 1, 2, \dots, T$ :

$x_{τ_{i}} \sim \underset{π_{for, i}^{*} (x_{τ_{i}} ∣ x_{τ_{i - 1}}, y)}{\underset{⏟}{N (x_{τ_{i - 1}} + Δ τ F_{i}^{L} (x_{τ_{i - 1}}, y), Δ τ I)}} .$

Hence the full path-measure is

$π^{*} (x_{0 : T} ∣ y) = p (x_{τ_{T}}) \prod_{i = T}^{1} π_{back, i}^{*} (x_{τ_{i - 1}} ∣ x_{τ_{i}}, y) \times \prod_{i = 1}^{T} π_{for, i}^{*} (x_{τ_{i}} ∣ x_{τ_{i - 1}}, y),$

which by construction satisfies both endpoint constraints for any $y$ . CSB finds the “smoothest” (in the sense of being entropy-regularized) stochastic flow between noise and data-posterior. This seems intuitively fine, but my reasoning here is vibes-based. I need to read the paper better to get a proper grip on it.

5.2 Sampling with the Learned Conditional Bridge

To draw $x_{0} \sim p (x_{0} ∣ y)$ :

Initialize $x_{τ_{T}} \sim N (0, I)$ .
Backward integrate the learned SDE (or its probability-flow ODE) from $τ = T$ down to $0$ :

$d x_{τ} = B_{i}^{L} (x_{τ}, y) d τ + \sqrt{Δ τ} d W_{τ} for τ \in [τ_{i - 1}, τ_{i}] .$
(Optionally) Forward integrate with $F_{i}^{L} (x, y)$ to refine or to compute likelihoods.

Because $B_{i}^{L}$ and $F_{i}^{L}$ both depend on $y$ , the same trained model applies to arbitrary observations.

Amortized CSB encodes the observation $y$ directly into every drift net $B_{i} (x, y)$ and $F_{i} (x, y)$ . There is no per-instance retraining or importance weights — once the joint IPF training over $(x_{0}, y)$ is done, we can plug in any new $y$ and run the sampler.

6 Computational Trade-offs Of Those Last Two

Aspect	Twisted SMC (TDS)	Amortized CSB
Training	Only train denoiser ${\hat{x}}_{0}$	Train $2 \times L$ drift nets on $(x, y)$
Inference cost	$K$ particles × $T$ steps	Single trajectory over $T$ steps
Exactness	As $K \to \infty$ , exact	as IPF perfectly trained, exact

7 Consistency models

Song et al. (2023)

8 Inpainting

If we want coherence with part of an existing image, we call that inpainting and there are specialized methods for it (Ajay et al. 2023; Grechka, Couairon, and Cord 2024; Liu, Niepert, and Broeck 2023; Lugmayr et al. 2022; Sharrock et al. 2022; Wu et al. 2024; Zhang et al. 2023).

9 Reconstruction/inversion

Perturbed and partial observations; misc methods therefor (Choi et al. 2021; Kawar et al. 2022; Nair, Mei, and Patel 2023; Peng et al. 2024; Xie and Li 2022; Zhao et al. 2023; Song, Shen, et al. 2022; Zamir et al. 2021; Chung et al. 2023; Sui et al. 2024).

10 References

Adam, Coogan, Malkin, et al. 2022. “Posterior Samples of Source Galaxies in Strong Gravitational Lenses with Score-Based Priors.”

Ajay, Du, Gupta, et al. 2023. “Is Conditional Generative Modeling All You Need for Decision-Making?” In.

Albergo, Goldstein, Boffi, et al. 2023. “Stochastic Interpolants with Data-Dependent Couplings.”

Bao, Feng, Cao, Meir, et al. 2016. “A First Order Scheme for Backward Doubly Stochastic Differential Equations.” SIAM/ASA Journal on Uncertainty Quantification.

Bao, Tianshu, Chen, Johnson, et al. 2022. “Physics Guided Neural Networks for Spatio-Temporal Super-Resolution of Turbulent Flows.” In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence.

Bao, Feng, Chipilski, Liang, et al. 2024. “Nonlinear Ensemble Filtering with Diffusion Models: Application to the Surface Quasi-Geostrophic Dynamics.”

Bao, Feng, Zhang, and Zhang. 2024a. “An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems.”

———. 2024b. “A Score-Based Filter for Nonlinear Data Assimilation.” Journal of Computational Physics.

Choi, Kim, Jeong, et al. 2021. “ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models.” In.

Chou, Bahat, and Heide. 2023. “Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions.”

Chung, Kim, Mccann, et al. 2023. “Diffusion Posterior Sampling for General Noisy Inverse Problems.” In.

Du, Durkan, Strudel, et al. 2023. “Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC.” In Proceedings of the 40th International Conference on Machine Learning.

Efron. 2011. “Tweedie’s Formula and Selection Bias.” Journal of the American Statistical Association.

Grechka, Couairon, and Cord. 2024. “GradPaint: Gradient-Guided Inpainting with Diffusion Models.” Computer Vision and Image Understanding.

Haitsiukevich, Poyraz, Marttinen, et al. 2024. “Diffusion Models as Probabilistic Neural Operators for Recovering Unobserved States of Dynamical Systems.”

Heng, De Bortoli, Doucet, et al. 2022. “Simulating Diffusion Bridges with Score Matching.”

Kawar, Elad, Ermon, et al. 2022. “Denoising Diffusion Restoration Models.” Advances in Neural Information Processing Systems.

Kawar, Vaksman, and Elad. 2021. “SNIPS: Solving Noisy Inverse Problems Stochastically.” In.

Kim, and Ye. 2021. “Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising Without Clean Images.” In.

Liang, Tran, Bao, et al. 2025. “Ensemble Score Filter with Image Inpainting for Data Assimilation in Tracking Surface Quasi-Geostrophic Dynamics with Partial Observations.”

Lipman, Chen, Ben-Hamu, et al. 2023. “Flow Matching for Generative Modeling.” In.

Liu, Niepert, and Broeck. 2023. “Image Inpainting via Tractable Steering of Diffusion Models.”

Lugmayr, Danelljan, Romero, et al. 2022. “RePaint: Inpainting Using Denoising Diffusion Probabilistic Models.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Nair, Mei, and Patel. 2023. “AT-DDPM: Restoring Faces Degraded by Atmospheric Turbulence Using Denoising Diffusion Probabilistic Models.” In.

Peng, Qiu, Wynne, et al. 2024. “CBCT-Based Synthetic CT Image Generation Using Conditional Denoising Diffusion Probabilistic Model.” Medical Physics.

Rozet, and Louppe. 2023a. “Score-Based Data Assimilation.”

———. 2023b. “Score-Based Data Assimilation for a Two-Layer Quasi-Geostrophic Model.”

Sharrock, Simons, Liu, et al. 2022. “Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models.”

Shi, Bortoli, Deligiannidis, et al. 2022. “Conditional Simulation Using Diffusion Schrödinger Bridges.”

Song, Dhariwal, Chen, et al. 2023. “Consistency Models.”

Song, Shen, Xing, et al. 2022. “Solving Inverse Problems in Medical Imaging with Score-Based Generative Models.” In.

Song, Sohl-Dickstein, Kingma, et al. 2022. “Score-Based Generative Modeling Through Stochastic Differential Equations.” In.

Sui, Ma, Zhang, et al. 2024. “Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution.”

Tzen, and Raginsky. 2019. “Theoretical Guarantees for Sampling and Inference in Generative Models with Latent Diffusions.” In Proceedings of the Thirty-Second Conference on Learning Theory.

Wu, Trippe, Naesseth, et al. 2024. “Practical and Asymptotically Exact Conditional Sampling in Diffusion Models.” In.

Xie, and Li. 2022. “Measurement-Conditioned Denoising Diffusion Probabilistic Model for Under-Sampled Medical Image Reconstruction.” In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022.

Xu, Ma, and Zhu. 2023. “Dual-Diffusion: Dual Conditional Denoising Diffusion Probabilistic Models for Blind Super-Resolution Reconstruction in RSIs.” IEEE Geoscience and Remote Sensing Letters.

Zamir, Arora, Khan, et al. 2021. “Multi-Stage Progressive Image Restoration.”

Zhang, Ji, Zhang, et al. 2023. “Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models.” In Proceedings of the 40th International Conference on Machine Learning. ICML’23.

Zhao, Bai, Zhu, et al. 2023. “DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion.” In.