Stable Diffusion: Open Sourcing Creativity

ai machine-learning

Stable Diffusion changed everything. High-quality AI image generation, running on consumer hardware, with open weights. Here’s what it means.

What’s Different

ModelAccessHardwareLicense
DALL-E 2API onlyOpenAI serversClosed
MidjourneyDiscord botTheir serversClosed
Stable DiffusionOpen weightsYour GPUOpen

You can run Stable Diffusion on a gaming GPU. You own the outputs. You can modify the model.

How It Works

Latent Diffusion

Text prompt: "A sunset over mountains, oil painting"

    Text encoder (CLIP)

    Encoded text embedding

    Diffusion process (U-Net)
    [Start with noise] → [Iteratively denoise] → [Latent image]

    VAE Decoder

    Final image (512x512)

Key insight: Work in a compressed “latent space” instead of pixel space. Much more efficient.

The Diffusion Process

# Simplified diffusion sampling
def sample(model, prompt, steps=50):
    # Start with pure noise
    latent = torch.randn(1, 4, 64, 64)
    
    # Get text embedding
    text_embedding = encode_text(prompt)
    
    # Iteratively denoise
    for t in reversed(range(steps)):
        # Predict noise
        noise_pred = model(latent, t, text_embedding)
        
        # Remove predicted noise
        latent = denoise_step(latent, noise_pred, t)
    
    # Decode to image
    image = decode_latent(latent)
    return image

Running Locally

Requirements

Setup

# Clone the repository
git clone https://github.com/CompVis/stable-diffusion
cd stable-diffusion

# Create environment
conda env create -f environment.yaml
conda activate ldm

# Download weights (requires HuggingFace account)
# Place in models/ldm/stable-diffusion-v1/model.ckpt

Basic Generation

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "A serene Japanese garden with cherry blossoms, digital art"
image = pipe(prompt).images[0]
image.save("garden.png")

With Parameters

image = pipe(
    prompt="A cyberpunk city at night, neon lights, rain",
    negative_prompt="blurry, low quality, distorted",
    num_inference_steps=50,
    guidance_scale=7.5,
    height=768,
    width=512,
).images[0]

Key Parameters

Guidance Scale (CFG)

How closely to follow the prompt:

ValueEffect
1-3Creative, may ignore prompt
7-8Balanced (recommended)
15+Literal, may look artificial

Steps

More steps = more refined, but diminishing returns:

StepsQualityTime
20DecentFast
50GoodModerate
100+Marginal improvementSlow

Seed

# Reproducible results
generator = torch.Generator("cuda").manual_seed(42)
image = pipe(prompt, generator=generator).images[0]

Advanced Techniques

Img2Img

from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(...)

init_image = Image.open("sketch.png").convert("RGB")

image = pipe(
    prompt="A detailed oil painting of a landscape",
    image=init_image,
    strength=0.75,  # How much to change (0=no change, 1=complete change)
).images[0]

Inpainting

from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained(...)

image = Image.open("photo.png")
mask = Image.open("mask.png")  # White = area to regenerate

result = pipe(
    prompt="A cat sitting on the chair",
    image=image,
    mask_image=mask,
).images[0]

Prompt Engineering

# Basic
"a mountain landscape"

# Better
"a majestic mountain landscape at sunset, dramatic lighting, 
 photorealistic, 8k, trending on artstation"

# Even better (with style references)
"a majestic mountain landscape at sunset, dramatic lighting,
 in the style of Albert Bierstadt, oil painting, 
 high detail, museum quality"

# Negative prompts help too
negative_prompt = "blurry, low quality, cartoon, anime, 
                   watermark, text, signature"

Hardware Optimization

Low VRAM Mode

pipe.enable_attention_slicing()  # Trade speed for memory
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()  # Move unused parts to CPU

Half Precision

# Load in fp16
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    revision="fp16"
)

On Mac (MPS)

pipe = pipe.to("mps")
# May need specific settings for M1/M2

The Ecosystem

UIs

Models and Fine-tunes

Ethical Considerations

The Good

The Concerning

Best Practices

Final Thoughts

Stable Diffusion proved that state-of-the-art AI can be open and accessible. Running on your own hardware means:

This is how AI tools should be distributed.


The best creative tool is the one you control.

All posts