Training Flow and Diffusion Models
These are some notes I took, while watching MIT 6.S184: Lecture 03.
The big picture of arriving at a trainable objective in flow matching [1]:
-
Our ideal goal is to learn . So we can try to formulate the flow matching loss with the marginal vector field:
Yet the issue is that, being intractable11 because it requires an integral over the entire data distribution.
-
To fix this intractability, we use the Conditional Flow Matching (CFM) loss, which regresses against the easy-to-calculate conditional vector field .
It can be proved that the Marginal flow matching loss equals the conditional flow matching loss () up to a constant that does not depend on the neural network parameters ().
Because their gradients are the same (), we can minimize the easy, conditional loss to implicitly solve the hard, marginal one. -
Now that we know we can use the conditional version, we pick a specific
- For instance, we can use a Gaussian conditional probability path
-
So what’s the Target? For this path, we can analytically calculate exactly what should be using its time derivatives.
using the standard reparametrization trick 22 we express as a function of standard Gaussian noise : This allows us to substitute every instance of in the equation with
-
Finally, we choose the simplest “schedulers” for and .
The Setup: We set and .This specific linear interpolation gives us the Conditional Optimal Transport (CondOT) path.
The Loss: This leads to the simple training objective:where
Doubts
- how different is the diffusion objective from the flow matching loss?
- Is cosine schedule of diffusion equivalent to linear interpolation in FM?
- Is Euler method used in practice? or do people prefer higher order methods? Heun’s method?
Bibliography
- [1] Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow Matching for Generative Modeling,” in 11th International Conference on Learning Representations, ICLR 2023, 2023.