Deep Learning's Hardware Lottery

June 9, 2022

ai machine-learning

A 2020 paper coined the term “hardware lottery”—the idea that winning machine learning algorithms are often those that run efficiently on available hardware. This explains more about AI’s trajectory than we might like to admit.

The Thesis

“Whether a research idea succeeds or fails often has more to do with whether it fits onto existing hardware than whether it’s the best approach.”

Transformers didn’t just win because attention is better than recurrence. They won because matrix multiplications map perfectly to GPUs.

Historical Examples

Neural Networks’ “Death”

1990s-2000s: Neural networks were considered dead.

Backpropagation → Complex       → Too slow on CPUs
SVMs             → Kernel trick → Efficient on CPUs
                            ↓
                    SVMs "won"

Then GPUs became available for general computing:

Neural networks + GPUs → Fast again → Deep learning revolution

The math never changed. The hardware did.

Transformers vs RNNs

RNNs process sequences step-by-step:

Token 1 → Hidden state → Token 2 → Hidden state → ...

Inherently sequential. Hard to parallelize.

Transformers use attention:

All tokens → Matrix multiplication → Output

Massively parallel. Perfect for GPUs.

RNN on GPU:     Limited speedup (sequential dependency)
Transformer:    Massive speedup (parallelizable)

Transformers map better to available hardware.

GPU Architecture Matters

What GPUs Do Well

Operation	GPU Speed
Matrix multiply	Very fast
Element-wise ops	Very fast
Random access	Slow
Branching	Slow
Memory-bound ops	Bottleneck

What This Rewards

Algorithms that work best:

Dense matrix operations (Transformers ✓)
Batch processing (Transformers ✓)
Regular memory access patterns (Transformers ✓)

Algorithms that struggle:

Sparse operations (Graph networks, dynamic architectures)
Sequential dependencies (RNNs, some reinforcement learning)
Memory-intensive operations (Large vocabulary models)

The Implications

Research is Biased

Researchers gravitate toward what works on available hardware:

Researcher has: NVIDIA GPUs
Researcher tries: Dense architectures, Transformers
Researcher succeeds: More papers on Transformers
Researcher doesn't try: Sparse architectures
Community: "Transformers are the best!"

The search space is constrained by hardware.

Alternative Ideas Get Abandoned

Interesting approaches that don’t map to GPUs:

Spiking neural networks
Neuromorphic computing
Sparse mixture of experts (until recently)
Memory-augmented networks

These might be better—we don’t know because they’re too slow to test at scale.

Infrastructure Lock-in

We’ve invested billions in:

NVIDIA ecosystem
CUDA software stack
Transformer-optimized TPUs

Switching to different architectures means:

Rewriting software
Redesigning hardware
Lost efficiency from specialization

Emerging Alternatives

Specialized AI Chips

Hardware	Optimized For
TPU	Matrix ops, Transformers
Cerebras	Large models, sparse
Graphcore	Graph operations
Groq	Inference

More diverse hardware could enable more diverse algorithms.

Sparse Architectures

Recent work on efficient Transformers:

# Dense attention: O(n²)
attention = softmax(Q @ K.T) @ V

# Sparse attention: O(n * k)
# Only attend to k < n tokens
attention = sparse_softmax(Q @ K[selected].T) @ V[selected]

Hardware support for sparsity is improving.

Mixture of Experts

Input → Router → Expert 1 (active)
              → Expert 2 (inactive)
              → Expert 3 (active)
              → ...

Only a few experts activate per input. More efficient if hardware supports it.

What Should Change

1. Diverse Hardware Investment

Don’t put all resources into one architecture. Keep alternatives viable.

2. Hardware-Agnostic Benchmarks

Measure algorithms on theoretical compute, not just wall-clock time on GPUs.

3. Simulation Over Hardware

Invest in simulators that can test novel architectures before building chips.

4. Acknowledge the Bias

When claiming “SOTA,” acknowledge the hardware assumptions. Results might not generalize.

The Lesson

AI progress isn’t purely about algorithmic innovation. It’s about:

Algorithm × Hardware × Scale = Success

The algorithms that win are often those that best exploit current hardware. That’s not the same as being the best algorithms.

Final Thoughts

The hardware lottery explains:

Why transformers dominate (GPU-friendly)
Why alternatives seem “fringe” (can’t test at scale)
Why AI progress might be hitting walls (hardware limits)

As we approach the limits of current architectures, new hardware could unlock new algorithmic frontiers—or we could be stuck waiting for the next hardware shift.

Your favorite AI might just be a lottery winner.

The best ideas don’t always win. The ones that fit the hardware do.