Distributed Superintelligence.
Bagel Labs is an Artificial Intelligence Research Lab developing novel methods for distributed training of frontier diffusion models on commodity hardware. Our work enables training of state-of-the-art generative models for robotics, video, and world modelling across heterogeneous hardware, unlocking compute capacity that current training architectures can't touch.
Decentralized Diffusion Models.
Distributed Diffusion Models (DDM) replace a single large diffusion model with an ensemble of smaller expert models, each trained independently on a partition of the dataset with no gradient synchronization between nodes. At inference, a lightweight router ensembles their outputs. This removes the tight coupling that forces conventional training onto homogeneous GPU superclusters.
Paris-1.
Paris is the first publicly released DDM. Despite using 14x less data and 16x less compute than prior decentralized baselines, it outperforms models trained on traditional monolithic clusters, achieving a 24% FID improvement (22.60 vs 29.64) on standard benchmarks.
| Inference Strategy | FID-50K ↓ |
|---|---|
| Monolithic (single) | 29.64 |
| Top-1 | 30.60 |
| Top-2 | 22.60 |
| Full Ensemble | 47.89 |
| Improvement | 7.04 |