Neural flow matching models
Like denoising diffusion except weirder
2021-11-10 — 2025-08-07
Suspiciously similar content
A close cousin to neural denoising diffusion models.
Flow Matching (FM) reframes (reboots?) denoising‑diffusion training as direct regression on the velocity field of a continuous normalising flow. I thiiiiiink that relative to classical diffusion models it
- spares us the KL term and the stochastic reverse SDE;
- yields “exact” log‑likelihoods in some stronger sense than DDPM; and
- decouples the forward path choice from the noise schedule, unlocking straight‑line, OT‑optimal or physics‑constrained trajectories.
1 From Diffusion to Flow Matching
1.1 Score‑based diffusion recap
A diffusion model trains a network \(s_\theta(x_t,t)\) to approximate \(\nabla_{x_t}\log p_t(x_t)\) on a noise‑perturbed data trajectory \(x_t\sim \mathcal N\!\bigl(e^{-t}x_0,(1-e^{-2t})I\bigr)\). Generation solves a stochastic reverse SDE whose drift involves that score (Holderrieth and Erives 2025).
1.2 Continuous normalising flows
CNFs model a deterministic flow \(\dot x_t = v_\theta(x_t,t)\). The log‑density evolves via the instantaneous change‑of‑variables (ICOV) formula
\[ \frac{d}{dt}\log p_t(x_t)= -\nabla\!\cdot v_\theta(x_t,t). \]
Training a CNF by maximum likelihood requires estimating this divergence along ODE solves, which is costly (Wildberger et al. 2023).
1.3 Flow Matching objective
Lipman et al. (2023) observed that if we choose a forward path \(x_t\) that connects a tractable base \(p_0\) to the data \(p_1\) and if we know its conditional velocity \(\tilde v_t(x_t)=\partial_t x_t\), then the optimal CNF that realises exactly that path is \(v_\star=\tilde v\). Hence we can train by plain regression:
\[ \min_\theta \mathbb E_{t\sim\mathcal U[0,1],\,x_t\sim p_t} \bigl\|v_\theta(x_t,t)-\tilde v_t(x_t)\bigr\|_2^2 . \]
No ICOV, no stochasticity. This is the Flow Matching loss according to the NeurIPS tutorial.
1.4 Relationship to diffusion
Choosing the variance‑preserving stochastic path recovers the DDPM/score‑matching objective (up to a scalar). Choosing a straight OT displacement yields Optimal Flow Matching (OFM) (Kornilov et al. 2024). Consistency Models (CMs) can be viewed as matching integrated velocities and are thus nested inside FM(Wang et al. 2025).
2 A minimal PyTorch implementation
import torch, torch.nn as nn, torch.nn.functional as F
class MLP(nn.Module):
def __init__(self, d, width=256):
super().__init__()
self.net = nn.Sequential(
nn.Linear(d+1, width), nn.SiLU(),
nn.Linear(width, width), nn.SiLU(),
nn.Linear(width, d)
)
def forward(self, x, t):
return self.net(torch.cat([x, t[:,None]], dim=1))
def train_step(v_theta, opt, x0, sigma=1.0):
# Linear forward path: x_t = (1-t)x0 + t eps, eps~N(0,σ^2I)
t = torch.rand(len(x0), device=x0.device)
eps = torch.randn_like(x0)*sigma
x_t = (1-t)[:,None]*x0 + t[:,None]*eps
v_target = eps - x0 # constant velocity (straight line)
loss = ((v_theta(x_t, t) - v_target)**2).sum(-1).mean()
opt.zero_grad(); loss.backward(); opt.step()
return loss
The model fits a time‑conditioned vector field that sends the base Gaussian to the data by a single straight‑line velocity. Sampling is one ODE solve (∼20 steps with Dormand–Prince); evaluating \(\log p_\theta(x)\) is exact via ICOV along the same path. Swap the path for VP or VE trajectories to imitate diffusion, or insert e.g. physics‑aware drift.
3 “Why practitioners adopt Flow Matching”
I got an LLM to prepare for me a feature matrix. Use at your peril.
Pain‑point in diffusion | FM fix | Caveat |
---|---|---|
Reverse SDE requires score + variance schedule; solving stochastic SDE is slow | Deterministic ODE; Heun or Dopri‑5 with 10–25 steps suffices | Trajectory choice matters; poor paths hurt sample quality |
Maximum‑likelihood cost needs Hutch++ divergence estimates | Supervised regression; no divergence nor KL | Still \(O(n)\) complexity in batch size |
Hard to impose physics constraints (e.g. mass/energy conservation) | Pick path satisfying constraints; regress on its known velocity; see Physics‑Constrained FM (PCFM) (Utkarsh et al. 2025) | Requires differentiable solver for target path |
Unclear how to guide with classifiers or text | Generalised guidance via energy functions; see “On the Guidance of FM” ([arXiv][8]) | No global closed‑form for variance of guided flow; tuning‑heavy |
4 Conditioning & Regularisation
4.1 Conditional FM
Posterior Flow Matching (FMPE) learns \(p(x—y)\) by concatenating observation \(y\) into the network and into the forward path so that \(\tilde v_t(x_t,y)\) remains analytic. This has produced SOTA likelihood‑free inference for simulator data (Wildberger et al. 2023).
it looks like we encode the observation into both the network and the forward interpolation?
4.2 Classifier / energy guidance
Because FM exposes the time‑dependent velocity field \(v_\theta(x,t)\), any differentiable energy \(E (x)\) can steer generation by adding \(-\lambda\nabla_x E (x)\) to the velocity. TODO: read error bounds in Zhou and Liu (2025).
4.3 Physics‑constrained flows
PCFM enforces hard constraints by making the forward path itself solve the PDE (e.g. Navier—Stokes) and regressing on its velocity. Empirically this keeps divergence‑free velocity fields very accurate on incompressible CFD benchmarks(Utkarsh et al. 2025).
This requires our constraint set to be “holonomic” (cannot be written as algebraic equations on positions). If this is not satisfied —no static path exists that satisfies it at all times— then we need something else. The PBFM framework further introduces Lagrange multipliers for conservation laws (Baldan et al. 2025).
TODO: are my problems holonomic?
Also note that velocity regression seems to amplify path‑discretisation error; see Zhou and Liu (2025).
5 Tutorials of note
Scott Hawley, Flow With What You Know
Lipman et al. (2024)
Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM
A Visual Dive into Conditional Flow Matching | ICLR Blogposts 2025
Let us Flow Together ༄࿐࿔🚀 (Liu, Gong, and Liu 2022)
Rectified flow offers an intuitive yet unified perspective on flow- and diffusion-based generative modeling. Also known as flow matching and stochastic interpolants, it has been increasingly used for state-of-the-art image, audio, and video generation, thanks to its simplicity and efficiency.
This series of tutorials on rectified flow addresses topics that are often sources of confusion and clarifies the connections with other methods.
The payoff is that they find flows whose trajectories are straight, enabling “one step” posterior sampling without the agony of solving ODEs
6 Discrete state
Start from Eijkelboom et al. (2024) ? Or @Davis et al. (2024)?
7 Open issues & research directions
An LLM proposed these “current research directions”.
- Path design for discrete data – Fisher FM shows promise but requires score‑norm annealing hyper‑schedules. (Davis et al. 2024)
- Long‑horizon RL integration – Flow Policy Optimisation frames PPO as conditional FM, yet variance explodes beyond 1 k steps (McAllister et al. 2025).
- Uncertainty calibration – Unlike diffusion, FM has no natural noise scaling; Bayesian extensions are missing.
- Theoretical generalisation – Provable Wasserstein bounds on FM density error remain open beyond 2‑W OT straight paths.