Neural flow matching models

Like denoising diffusion except weirder

2021-11-10 — 2025-08-07

approximation
Bayes
generative
Monte Carlo
neural nets
optimization
probabilistic algorithms
probability
score function
statistics

A close cousin to neural denoising diffusion models.

Figure 1

Flow Matching (FM) reframes (reboots?) denoising‑diffusion training as direct regression on the velocity field of a continuous normalising flow. I thiiiiiink that relative to classical diffusion models it

  1. spares us the KL term and the stochastic reverse SDE;
  2. yields “exact” log‑likelihoods in some stronger sense than DDPM; and
  3. decouples the forward path choice from the noise schedule, unlocking straight‑line, OT‑optimal or physics‑constrained trajectories.

1 From Diffusion to Flow Matching

1.1 Score‑based diffusion recap

A diffusion model trains a network \(s_\theta(x_t,t)\) to approximate \(\nabla_{x_t}\log p_t(x_t)\) on a noise‑perturbed data trajectory \(x_t\sim \mathcal N\!\bigl(e^{-t}x_0,(1-e^{-2t})I\bigr)\). Generation solves a stochastic reverse SDE whose drift involves that score (Holderrieth and Erives 2025).

1.2 Continuous normalising flows

CNFs model a deterministic flow \(\dot x_t = v_\theta(x_t,t)\). The log‑density evolves via the instantaneous change‑of‑variables (ICOV) formula

\[ \frac{d}{dt}\log p_t(x_t)= -\nabla\!\cdot v_\theta(x_t,t). \]

Training a CNF by maximum likelihood requires estimating this divergence along ODE solves, which is costly (Wildberger et al. 2023).

1.3 Flow Matching objective

Lipman et al. (2023) observed that if we choose a forward path \(x_t\) that connects a tractable base \(p_0\) to the data \(p_1\) and if we know its conditional velocity \(\tilde v_t(x_t)=\partial_t x_t\), then the optimal CNF that realises exactly that path is \(v_\star=\tilde v\). Hence we can train by plain regression:

\[ \min_\theta \mathbb E_{t\sim\mathcal U[0,1],\,x_t\sim p_t} \bigl\|v_\theta(x_t,t)-\tilde v_t(x_t)\bigr\|_2^2 . \]

No ICOV, no stochasticity. This is the Flow Matching loss according to the NeurIPS tutorial.

Check out the source code and follow along: facebookresearch/flow_matching.

1.4 Relationship to diffusion

Choosing the variance‑preserving stochastic path recovers the DDPM/score‑matching objective (up to a scalar). Choosing a straight OT displacement yields Optimal Flow Matching (OFM) (Kornilov et al. 2024). Consistency Models (CMs) can be viewed as matching integrated velocities and are thus nested inside FM(Wang et al. 2025).

2 A minimal PyTorch implementation

import torch, torch.nn as nn, torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self, d, width=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d+1, width), nn.SiLU(),
            nn.Linear(width, width), nn.SiLU(),
            nn.Linear(width, d)
        )
    def forward(self, x, t):
        return self.net(torch.cat([x, t[:,None]], dim=1))

def train_step(v_theta, opt, x0, sigma=1.0):
    # Linear forward path: x_t = (1-t)x0 + t eps, eps~N(0,σ^2I)
    t = torch.rand(len(x0), device=x0.device)
    eps = torch.randn_like(x0)*sigma
    x_t = (1-t)[:,None]*x0 + t[:,None]*eps
    v_target = eps - x0            # constant velocity (straight line)
    loss = ((v_theta(x_t, t) - v_target)**2).sum(-1).mean()
    opt.zero_grad(); loss.backward(); opt.step()
    return loss

The model fits a time‑conditioned vector field that sends the base Gaussian to the data by a single straight‑line velocity. Sampling is one ODE solve (∼20 steps with Dormand–Prince); evaluating \(\log p_\theta(x)\) is exact via ICOV along the same path. Swap the path for VP or VE trajectories to imitate diffusion, or insert e.g. physics‑aware drift.

3 “Why practitioners adopt Flow Matching”

I got an LLM to prepare for me a feature matrix. Use at your peril.

Pain‑point in diffusion FM fix Caveat
Reverse SDE requires score + variance schedule; solving stochastic SDE is slow Deterministic ODE; Heun or Dopri‑5 with 10–25 steps suffices Trajectory choice matters; poor paths hurt sample quality
Maximum‑likelihood cost needs Hutch++ divergence estimates Supervised regression; no divergence nor KL Still \(O(n)\) complexity in batch size
Hard to impose physics constraints (e.g. mass/energy conservation) Pick path satisfying constraints; regress on its known velocity; see Physics‑Constrained FM (PCFM) (Utkarsh et al. 2025) Requires differentiable solver for target path
Unclear how to guide with classifiers or text Generalised guidance via energy functions; see “On the Guidance of FM” ([arXiv][8]) No global closed‑form for variance of guided flow; tuning‑heavy

4 Conditioning & Regularisation

4.1 Conditional FM

Posterior Flow Matching (FMPE) learns \(p(x—y)\) by concatenating observation \(y\) into the network and into the forward path so that \(\tilde v_t(x_t,y)\) remains analytic. This has produced SOTA likelihood‑free inference for simulator data (Wildberger et al. 2023).

it looks like we encode the observation into both the network and the forward interpolation?

4.2 Classifier / energy guidance

Because FM exposes the time‑dependent velocity field \(v_\theta(x,t)\), any differentiable energy \(E (x)\) can steer generation by adding \(-\lambda\nabla_x E (x)\) to the velocity. TODO: read error bounds in Zhou and Liu (2025).

4.3 Physics‑constrained flows

PCFM enforces hard constraints by making the forward path itself solve the PDE (e.g. Navier—Stokes) and regressing on its velocity. Empirically this keeps divergence‑free velocity fields very accurate on incompressible CFD benchmarks(Utkarsh et al. 2025).

This requires our constraint set to be “holonomic” (cannot be written as algebraic equations on positions). If this is not satisfied —no static path exists that satisfies it at all times— then we need something else. The PBFM framework further introduces Lagrange multipliers for conservation laws (Baldan et al. 2025).

TODO: are my problems holonomic?

Also note that velocity regression seems to amplify path‑discretisation error; see Zhou and Liu (2025).

5 Tutorials of note

  • Scott Hawley, Flow With What You Know

  • Lipman et al. (2024)

    Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM

    facebookresearch/flow_matching.

  • A Visual Dive into Conditional Flow Matching | ICLR Blogposts 2025

  • Let us Flow Together ༄࿐࿔🚀 (Liu, Gong, and Liu 2022)

    Rectified flow offers an intuitive yet unified perspective on flow- and diffusion-based generative modeling. Also known as flow matching and stochastic interpolants, it has been increasingly used for state-of-the-art image, audio, and video generation, thanks to its simplicity and efficiency.

    This series of tutorials on rectified flow addresses topics that are often sources of confusion and clarifies the connections with other methods.

    The payoff is that they find flows whose trajectories are straight, enabling “one step” posterior sampling without the agony of solving ODEs

6 Discrete state

Start from Eijkelboom et al. (2024) ? Or @Davis et al. (2024)?

7 Open issues & research directions

An LLM proposed these “current research directions”.

  1. Path design for discrete data – Fisher FM shows promise but requires score‑norm annealing hyper‑schedules. (Davis et al. 2024)
  2. Long‑horizon RL integration – Flow Policy Optimisation frames PPO as conditional FM, yet variance explodes beyond 1 k steps (McAllister et al. 2025).
  3. Uncertainty calibration – Unlike diffusion, FM has no natural noise scaling; Bayesian extensions are missing.
  4. Theoretical generalisation – Provable Wasserstein bounds on FM density error remain open beyond 2‑W OT straight paths.

8 References

Baldan, Liu, Guardone, et al. 2025. Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation.”
Cheng, Han, Maddix, et al. 2024. Hard Constraint Guided Flow Matching for Gradient-Free Generation of PDE Solutions.”
Davis, Kessler, Petrache, et al. 2024. Fisher Flow Matching for Generative Modeling over Discrete Data.” In.
Eijkelboom, Bartosh, Naesseth, et al. 2024. Variational Flow Matching for Graph Generation.”
Feng, Yu, Deng, et al. 2025. On the Guidance of Flow Matching.”
Gudovskiy, Okuno, and Nakata. 2024. DFM: Interpolant-Free Dual Flow Matching.” In.
Holderrieth, and Erives. 2025. An Introduction to Flow Matching and Diffusion Models.”
Holderrieth, Xu, and Jaakkola. 2024. Hamiltonian Score Matching and Generative Flows.” In.
Kerrigan, Migliorini, and Smyth. 2024. Functional Flow Matching.” In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics.
Köhler, Chen, Krämer, et al. 2023. Flow-Matching: Efficient Coarse-Graining of Molecular Dynamics Without Forces.” Journal of Chemical Theory and Computation.
Kolesov, Stepan, Palyulin, et al. 2025. Field Matching: An Electrostatic Paradigm to Generate and Transfer Data.”
Kornilov, Mokrov, Gasnikov, et al. 2024. Optimal Flow Matching: Learning Straight Trajectories in Just One Step.” Advances in Neural Information Processing Systems.
Lipman, Chen, Ben-Hamu, et al. 2023. Flow Matching for Generative Modeling.” In.
Lipman, Havasi, Holderrieth, et al. 2024. Flow Matching Guide and Code.”
Liu, Gong, and Liu. 2022. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow.” In.
McAllister, Ge, Yi, et al. 2025. Flow Matching Policy Gradients.”
Schusterbauer, Gui, Fundel, et al. 2025. Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Utkarsh, Cai, Edelman, et al. 2025. Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints.”
Wang, Huang, Bergman, et al. 2025. Phased Consistency Models.” In Proceedings of the 38th International Conference on Neural Information Processing Systems. NIPS ’24.
Wildberger, Dax, Buchholz, et al. 2023. Flow Matching for Scalable Simulation-Based Inference.” In.
Zhou, and Liu. 2025. An Error Analysis of Flow Matching for Deep Generative Modeling.” In.