Flux Models: Open Sourcing High Quality Image Generation
In August 2024, Black Forest Labs released Flux—a family of image generation models that rivaled Midjourney and DALL-E 3. Some variants were open source. The image generation landscape shifted again.
The Flux Family
| Model | Access | Quality | Use Case |
|---|---|---|---|
| Flux.1 Pro | API only | Best | Production |
| Flux.1 Dev | Open weights | Near-Pro | Development |
| Flux.1 Schnell | Open, distilled | Good, fast | Iteration |
What Made Flux Different
Quality
Prompt following: Exceptional
Text rendering: Actually readable
Anatomy: Fewer artifacts
Composition: Follows complex prompts
Flux understood complex prompts better than predecessors.
Architecture
- Rectified flow transformer (not diffusion)
- Multimodal training approach
- T5 text encoder (not just CLIP)
Running Flux Locally
With ComfyUI
# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
# Download Flux.1 Schnell (fast) or Dev (quality)
# Place in models/checkpoints/
python main.py
With Diffusers
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
prompt = "A serene Japanese garden with a red bridge over a koi pond, cherry blossoms falling, anime style"
image = pipe(
prompt,
guidance_scale=0.0, # Schnell doesn't need guidance
num_inference_steps=4, # Schnell is fast
generator=torch.Generator("cpu").manual_seed(42)
).images[0]
image.save("output.png")
Hardware Requirements
| Model | VRAM (fp16) | VRAM (quantized) |
|---|---|---|
| Schnell | 24GB | 12GB |
| Dev | 24GB | 12GB |
Consumer GPUs (RTX 4090) can run it.
Memory Optimization
# For lower VRAM
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16
)
pipe.enable_sequential_cpu_offload() # Uses less VRAM
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
Comparison
Prompt Following
Prompt: "A cat wearing a tiny chef's hat, whisking eggs
in a miniature kitchen, photorealistic"
Midjourney: 85% accurate
DALL-E 3: 90% accurate
Flux.1 Dev: 95% accurate
Flux excels at complex, specific prompts.
Text in Images
Prompt: "A neon sign that says 'OPEN 24 HOURS'"
Midjourney: Often garbled
DALL-E 3: Usually correct
Flux: Usually correct
T5 encoder helps with text understanding.
Speed
| Model | Steps | Time (RTX 4090) |
|---|---|---|
| Flux.1 Schnell | 4 | ~2s |
| Flux.1 Dev | 20 | ~15s |
| SDXL | 30 | ~10s |
Schnell is remarkably fast for its quality.
API Usage
import replicate
output = replicate.run(
"black-forest-labs/flux-schnell",
input={
"prompt": "A futuristic city at sunset",
"aspect_ratio": "16:9",
"num_outputs": 1
}
)
print(output)
Available on Replicate, FAL, and other platforms.
Practical Applications
Fast Iteration
# Use Schnell for drafts
for prompt_variation in prompt_variations:
image = schnell_pipe(prompt_variation, steps=4)
# Quick preview
# Use Dev for final
final = dev_pipe(best_prompt, steps=20)
Batch Processing
# Generate variations
prompts = [
f"Product photo of a {color} water bottle on white background"
for color in ["red", "blue", "green", "black"]
]
images = pipe(prompts, batch_size=4)
LoRA Fine-Tuning
# Fine-tune for specific style
from diffusers import FluxLoraLoaderMixin
pipe.load_lora_weights("path/to/flux-lora")
image = pipe("A portrait in my custom style")
The Business Model
Flux.1 Pro: API-only, revenue
Flux.1 Dev: Open weights, builds ecosystem
Flux.1 Schnell: Open + Apache 2.0, maximum adoption
Open core model—community builds on open versions, businesses pay for Pro.
Ecosystem
Within weeks:
- ControlNet adapters released
- LoRA training guides published
- ComfyUI workflows shared
- Fine-tuned variants appeared
Open weights accelerate innovation.
Black Forest Labs
- Founded by former Stability AI researchers
- Created original Stable Diffusion
- $31M seed funding
- Competing on quality AND openness
Implications
For Developers
# Open weights mean:
# - Local deployment
# - Fine-tuning possible
# - No API costs for development
# - Privacy preserved
For the Market
Before Flux: Pay for quality (Midjourney/DALL-E) or use OK open (SD)
After Flux: High quality available open
Midjourney’s moat shrunk.
For Creativity
More accessible tools = More creators
Open models = More experimentation
Limitations
- Still can struggle with hands (sometimes)
- Text not always perfect
- High-VRAM requirements
- Pro model is API-only
Final Thoughts
Flux showed that frontier image quality doesn’t require closed APIs. The Dev model is genuinely competitive with Midjourney and DALL-E 3.
For developers, this means professional-grade image generation without API costs or privacy concerns.
Open source meets professional quality.