With neural diffusion models, we can generate samples from the unconditional distribution
There are lots of ways we might try to condition, differing sometimes only in emphasis.
1 Notation
First, let us fix notation. I’ll use a slight variant of the notation from the denoising diffusion SDE notebook. Because I need
For simplicity, we’ll assume a variance-preserving (VP) diffusion SDE. We corrupt data
or in discrete form for each step
We also define the convenience terms
1.1 Score Network & Training
We train
where
Equivalently, we can parametrize
1.2 Reverse-Time Sampling
The reverse SDE is as follows. Integrate the reverse SDE from
Alternatively, we can use the deterministic / probability-flow ODE:
with initial draw $x_{_T}N(0,I). This yields the same marginals
On our
The DDIM variant removes
2 Generic conditioning
Here is a quick rewrite of Rozet and Louppe (2023b). Note I have updated the notation to match the rest of this notebook.
We could train a conditional score network
to approximate the posterior score and plug it into the reverse SDE. But this requires pairs during training and re‐training whenever the observation model changes. Instead, many have observed (Song, Sohl-Dickstein, et al. 2022; Adam et al. 2022; Chung et al. 2023; Kawar, Vaksman, and Elad 2021; Song, Shen, et al. 2022) that by Bayes’ rule the posterior score decomposes as
Since the prior score is well‐approximated by a single score network , the remaining task is to estimate the likelihood score . Assuming a differentiable measurement operator
and Gaussian observations , Chung et al. (2023) propose approximating where the denoised mean is given by Tweedie’s formula (Efron 2011; Kim and Ye 2021): Because the log‐likelihood of a multivariate Gaussian is analytic and is differentiable, we can compute in a zero‐shot fashion—without training any network beyond the unconditional score model .
Note that this last assumption is strong; probably too strong for the models I would bother using diffusions on. Don’t worry we can get fancier and more effective.
3 Ensemble Score Conditioning
A simple trick that sometimes works (F. Bao, Zhang, and Zhang 2024a) but is biased. TBC.
4 Sequential Monte Carlo
This seems to be SOTA?
LLM-aided summary of Wu et al. (2024):
We recall standard SMC / Particle Filtering:
Goal: sample from a sequence of distributions
, ending in some target .Particles: maintain
samples (particles) with weights .Iterate for
:Resample particles according to
to focus on high-probability regions.Propose
.Weight each by
Convergence: as
, the weighted ensemble approximates the true target exactly.
In a diffusion model, we can view the reverse noising chain
as exactly such a sequential model over
To sample conditionally
and only tack on a final weight
Twisting is a classic SMC technique which solves this problem, introducing a sequence of auxiliary functions
which—if we could sample it—would make SMC exact with a single particle. However,
TDS replaces the optimal twisting
i.e. we evaluate the observation-likelihood at the diffusion denoiser’s one-step posterior mean estimate
Twisted proposal from
:where
Twisted weight for each particle:
Twisted Sister:
This corrects for using the surrogate twisting and ensures asymptotic exactness as
In early steps (
In practice, we often need a surprisingly small number of particles; even 2–8 particles often suffice to outperform heuristic conditional samplers (like plain classifier guidance or “replacement” inpainting).
5 (Conditional) Schrödinger Bridge
Shi et al. (2022) introduced the Conditional Schrödinger Bridge (CSB), which is a natural extension of the Schrödinger Bridge.
We seek a path-measure
subject to
- Start at
: , - End at
: .
Here
5.1 Amortized IPF Algorithm
We parameterize two families of drift networks that take
We alternate two KL-projection steps:
Backward half-step (
, enforce prior):Fit
by matching the backward SDE induced by .Forward half-step (
, enforce posterior):Fit
by matching the forward SDE induced by .
After
We can imagine that the backward step “pins” the Gaussian prior; the forward step “pins” the conditional
Initial draw at
:Backward transitions for
:Forward transitions for
:
Hence the full path-measure is
which by construction satisfies both endpoint constraints for any
5.2 Sampling with the Learned Conditional Bridge
To draw
Initialize
.Backward integrate the learned SDE (or its probability-flow ODE) from
down to :(Optionally) Forward integrate with
to refine or to compute likelihoods.
Because
Amortized CSB encodes the observation
6 Computational Trade-offs Of Those Last Two
Aspect | Twisted SMC (TDS) | Amortized CSB |
---|---|---|
Training | Only train denoiser |
Train |
Inference cost | Single trajectory over |
|
Exactness | As |
as IPF perfectly trained, exact |
7 Consistency models
Song et al. (2023)
8 Inpainting
If we want coherence with part of an existing image, we call that inpainting and there are specialized methods for it (Ajay et al. 2023; Grechka, Couairon, and Cord 2024; Liu, Niepert, and Broeck 2023; Lugmayr et al. 2022; Sharrock et al. 2022; Wu et al. 2024; Zhang et al. 2023).
9 Reconstruction/inversion
Perturbed and partial observations; misc methods therefor (Choi et al. 2021; Kawar et al. 2022; Nair, Mei, and Patel 2023; Peng et al. 2024; Xie and Li 2022; Zhao et al. 2023; Song, Shen, et al. 2022; Zamir et al. 2021; Chung et al. 2023; Sui et al. 2024).