Neural flow matching models

Like denoising diffusion except weirder

2021-11-10 — 2025-08-07

Wherein flow matching is presented as a deterministic reformulation of diffusion, regression on velocity fields is performed, and straight‑line optimal‑transport trajectories are used to enable one‑ODE sampling and exact ICOV log‑likelihoods.

approximation
Bayes
generative
Monte Carlo
neural nets
optimization
probabilistic algorithms
probability
score function
statistics

A close cousin of neural denoising diffusion models.

Figure 1

Flow Matching (FM) reframes (or reboots?) denoising‑diffusion training as direct regression on the velocity field of a continuous normalising flow. I thiiiiiink that, relative to classical diffusion models, it

  1. spares us the KL term and the stochastic reverse SDE;
  2. yields “exact” log‑likelihoods in some stronger sense than DDPM; and
  3. decouples the forward path choice from the noise schedule, unlocking straight‑line, OT‑optimal or physics‑constrained trajectories.

1 From Diffusion to Flow Matching

1.1 Score‑based diffusion recap

A diffusion model trains a network \(s_\theta(x_t,t)\) to approximate \(\nabla_{x_t}\log p_t(x_t)\) on a noise‑perturbed data trajectory \(x_t\sim \mathcal N\!\bigl(e^{-t}x_0,(1-e^{-2t})I\bigr)\). Generation solves a stochastic reverse SDE whose drift involves the score (Holderrieth and Erives 2025).

1.2 Continuous normalising flows

CNFs model a deterministic flow \(\dot x_t = v_\theta(x_t,t)\). The log‑density evolves via the instantaneous change‑of‑variables (ICOV) formula

\[ \frac{d}{dt}\log p_t(x_t)= -\nabla\!\cdot v_\theta(x_t,t). \]

Training a CNF by maximum likelihood requires us to estimate this divergence along ODE solves, which is costly (Wildberger et al. 2023).

1.3 Flow Matching objective

Lipman et al. (2023) observed that if we choose a forward path \(x_t\) that connects a tractable base \(p_0\) to the data \(p_1\), and if we know the forward path’s conditional velocity \(\tilde v_t(x_t)=\partial_t x_t\), then the optimal CNF that realizes exactly that path is \(v_\star=\tilde v\). Hence, we can train by plain regression:

\[ \min_\theta \mathbb E_{t\sim\mathcal U[0,1],\,x_t\sim p_t} \bigl\|v_\theta(x_t,t)-\tilde v_t(x_t)\bigr\|_2^2 . \]

No ICOV and no stochasticity. This is the Flow Matching loss, as described in the NeurIPS tutorial.

Check out the source code and follow along: facebookresearch/flow_matching.

1.4 Relationship to diffusion

If we choose the variance‑preserving stochastic path, we recover the DDPM/score‑matching objective (up to a scalar). If we choose a straight OT displacement, we get Optimal Flow Matching (OFM) (Kornilov et al. 2024). We can view Consistency Models (CMs) as matching integrated velocities, so they’re nested inside FM(Wang et al. 2025).

2 A minimal PyTorch implementation

import torch, torch.nn as nn, torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self, d, width=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d+1, width), nn.SiLU(),
            nn.Linear(width, width), nn.SiLU(),
            nn.Linear(width, d)
        )
    def forward(self, x, t):
        return self.net(torch.cat([x, t[:,None]], dim=1))

def train_step(v_theta, opt, x0, sigma=1.0):
    # Linear forward path: x_t = (1-t)x0 + t eps, eps~N(0,σ^2I)
    t = torch.rand(len(x0), device=x0.device)
    eps = torch.randn_like(x0)*sigma
    x_t = (1-t)[:,None]*x0 + t[:,None]*eps
    v_target = eps - x0            # constant velocity (straight line)
    loss = ((v_theta(x_t, t) - v_target)**2).sum(-1).mean()
    opt.zero_grad(); loss.backward(); opt.step()
    return loss

The model fits a time‑conditioned vector field that transports the base Gaussian to the data distribution using a single straight‑line velocity. Sampling requires one ODE solve (∼20 steps with Dormand–Prince); evaluating \(\log p_\theta(x)\) is exact via ICOV along the same path. Swap the path for VP or VE trajectories to imitate diffusion, or insert e.g. physics‑aware drift.

3 “Why practitioners adopt Flow Matching”

I got an LLM to prepare for me a feature matrix. Use at your peril.

Pain‑point in diffusion FM fix Caveat
Reverse SDE requires score + variance schedule; solving stochastic SDE is slow Deterministic ODE; Heun or Dopri‑5 with 10–25 steps suffices Trajectory choice matters; poor paths hurt sample quality
Maximum‑likelihood cost needs Hutch++ divergence estimates Supervised regression; no divergence nor KL Still \(O(n)\) complexity in batch size
Hard to impose physics constraints (e.g. mass/energy conservation) Pick a path satisfying constraints; regress on its known velocity; see Physics‑Constrained FM (PCFM) (Utkarsh et al. 2025) Requires differentiable solver for target path
Unclear how to guide with classifiers or text Generalised guidance via energy functions; see “On the Guidance of FM” ([arXiv][8]) No global closed‑form for the variance of the guided flow; tuning‑heavy

4 Conditioning & Regularisation

4.1 Conditional FM

Posterior Flow Matching (FMPE) learns \(p(x—y)\) by concatenating the observation \(y\) into the network and into the forward path so that \(\tilde v_t(x_t,y)\) remains analytic. This has produced SOTA likelihood‑free inference for simulator data (Wildberger et al. 2023).

It looks like we encode the observation into both the network and the forward interpolation?

4.2 Classifier / energy guidance

Because FM exposes the time‑dependent velocity field \(v_\theta(x,t)\), any differentiable energy \(E (x)\) can steer generation by adding \(-\lambda\nabla_x E (x)\) to the velocity. TODO: Read error bounds in Zhou and Liu (2025).

4.3 Physics‑constrained flows

PCFM enforces hard constraints by making the forward path itself solve the PDE (e.g. Navier—Stokes) and regressing on its velocity. Empirically, this keeps divergence‑free velocity fields very accurate on incompressible CFD benchmarks(Utkarsh et al. 2025).

This requires our constraint set to be “holonomic” (cannot be written as algebraic equations on positions). If this is not satisfied —no static path exists that satisfies it at all times— then we need something else. The PBFM framework further introduces Lagrange multipliers for conservation laws (Baldan et al. 2025).

TODO: Are my problems holonomic?

Also note that velocity regression seems to amplify path‑discretization error; see Zhou and Liu (2025).

5 Tutorials of note

  • Scott Hawley, Flow With What You Know

  • Lipman et al. (2024)

    Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM

    facebookresearch/flow_matching.

  • A Visual Dive into Conditional Flow Matching | ICLR Blogposts 2025

  • Let us Flow Together ༄࿐࿔🚀 (Liu, Gong, and Liu 2022)

    Rectified flow offers an intuitive yet unified perspective on flow- and diffusion-based generative modeling. Also known as flow matching and stochastic interpolants, it has been increasingly used for state-of-the-art image, audio, and video generation, thanks to its simplicity and efficiency.

    This series of tutorials on rectified flow addresses topics that are often sources of confusion and clarifies the connections with other methods.

The payoff is that we find flows whose trajectories are straight, enabling “one step” posterior sampling without the agony of solving ODEs.

6 Discrete state

Start from Eijkelboom et al. (2024)? Or @Davis et al. (2024)?

7 Open issues & research directions

An LLM proposed these “current research directions”.

  1. Path design for discrete data – Fisher FM shows promise but requires score‑norm annealing hyper‑schedules. (Davis et al. 2024)
  2. Long‑horizon RL integration – Flow Policy Optimization frames PPO as a conditional FM, yet variance explodes beyond 1k steps (McAllister et al. 2025).
  3. Uncertainty calibration – Unlike diffusion, FM has no natural noise scaling, and we lack Bayesian extensions.
  4. Theoretical generalization – Provable Wasserstein bounds on FM density error remain open beyond 2‑W OT straight paths.

8 References

Baldan, Liu, Guardone, et al. 2025. Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation.”
Cheng, Han, Maddix, et al. 2024. Hard Constraint Guided Flow Matching for Gradient-Free Generation of PDE Solutions.”
Davis, Kessler, Petrache, et al. 2024. Fisher Flow Matching for Generative Modeling over Discrete Data.” In.
Eijkelboom, Bartosh, Naesseth, et al. 2024. Variational Flow Matching for Graph Generation.”
Feng, Yu, Deng, et al. 2025. On the Guidance of Flow Matching.”
Gudovskiy, Okuno, and Nakata. 2024. DFM: Interpolant-Free Dual Flow Matching.” In.
Holderrieth, and Erives. 2025. An Introduction to Flow Matching and Diffusion Models.”
Holderrieth, Xu, and Jaakkola. 2024. Hamiltonian Score Matching and Generative Flows.” In.
Kerrigan, Migliorini, and Smyth. 2024. Functional Flow Matching.” In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics.
Köhler, Chen, Krämer, et al. 2023. Flow-Matching: Efficient Coarse-Graining of Molecular Dynamics Without Forces.” Journal of Chemical Theory and Computation.
Kolesov, Stepan, Palyulin, et al. 2025. Field Matching: An Electrostatic Paradigm to Generate and Transfer Data.”
Kornilov, Mokrov, Gasnikov, et al. 2024. Optimal Flow Matching: Learning Straight Trajectories in Just One Step.” Advances in Neural Information Processing Systems.
Lipman, Chen, Ben-Hamu, et al. 2023. Flow Matching for Generative Modeling.” In.
Lipman, Havasi, Holderrieth, et al. 2024. Flow Matching Guide and Code.”
Liu, Gong, and Liu. 2022. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow.” In.
McAllister, Ge, Yi, et al. 2025. Flow Matching Policy Gradients.”
Schusterbauer, Gui, Fundel, et al. 2025. Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Utkarsh, Cai, Edelman, et al. 2025. Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints.”
Wang, Huang, Bergman, et al. 2025. Phased Consistency Models.” In Proceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24.
Wildberger, Dax, Buchholz, et al. 2023. Flow Matching for Scalable Simulation-Based Inference.” In.
Zhou, and Liu. 2025. An Error Analysis of Flow Matching for Deep Generative Modeling.” In.