Bagel Labs | Distributed Diffusion Training Infrastructure

Distributed Superintelligence.

Bagel Labs is an Artificial Intelligence Research Lab developing novel methods for distributed training of frontier diffusion models on commodity hardware. Our work enables training of state-of-the-art generative models for robotics, video, and world modelling across heterogeneous hardware, unlocking compute capacity that current training architectures can't touch.

Decentralized Diffusion Models.

Distributed Diffusion Models (DDM) replace a single large diffusion model with an ensemble of smaller expert models, each trained independently on a partition of the dataset with no gradient synchronization between nodes. At inference, a lightweight router ensembles their outputs. This removes the tight coupling that forces conventional training onto homogeneous GPU superclusters.

Paris-1.

Paris is the first publicly released DDM. Despite using 14x less data and 16x less compute than prior decentralized baselines, it outperforms models trained on traditional monolithic clusters, achieving a 24% FID improvement (22.60 vs 29.64) on standard benchmarks.

Inference Strategy	FID-50K ↓
Monolithic (single)	29.64

Top-1	30.60
Top-2	22.60
Full Ensemble	47.89

Improvement	7.04

Paris-2.

Paris-2 is a video DDM pre-trained from scratch. Three 11B video experts plus a lightweight router beat a matched-compute monolithic baseline: 50%+ FVD improvement, plus CLIP, aesthetic, and motion gains.

g = 1