GPT-3: 175 Billion Parameters of Potential
GPT-3 dropped in June 2020 with 175 billion parameters. That’s 100x larger than GPT-2. The demos are stunning. The implications are profound.
The Scale
| Model | Parameters | Training Cost |
|---|---|---|
| GPT-2 | 1.5B | ~$50K |
| T5-11B | 11B | ~$1.3M |
| GPT-3 | 175B | ~$12M |
Training GPT-3 from scratch would cost millions in compute. Few organizations can afford this.
What GPT-3 Can Do
Few-Shot Learning
Give it examples, it learns the task:
Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
cheese =>
Output: fromage
No fine-tuning required. Just examples in the prompt.
Code Generation
# A Python function that takes a list of numbers
# and returns the sum of squares
def sum_of_squares(numbers):
return sum(x**2 for x in numbers)
SQL from Natural Language
Prompt: "Create a SQL query that finds all customers
who spent more than $1000 in the last month"
Output:
SELECT customer_id, SUM(amount) as total
FROM orders
WHERE order_date >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
GROUP BY customer_id
HAVING total > 1000;
Text Summarization
Condense articles, documents, conversations without task-specific training.
Creative Writing
Generate poems, stories, dialogue that’s often indistinguishable from human writing.
How It Works
GPT-3 is a transformer decoder, like GPT-2, just bigger:
Input Tokens → Embeddings → 96 Transformer Layers → Output Probabilities
Training objective: Predict the next token given previous tokens.
Training data: "The quick brown fox"
Model learns: P("jumps" | "The quick brown fox") is high
The API
GPT-3 is API-only. No weights released.
import openai
openai.api_key = "your-key"
response = openai.Completion.create(
model="davinci",
prompt="Write a haiku about programming:",
max_tokens=50,
temperature=0.7
)
print(response.choices[0].text)
Model Sizes
OpenAI offers multiple GPT-3 variants:
| Model | Parameters | Speed | Quality |
|---|---|---|---|
| Ada | 350M | Fastest | Lowest |
| Babbage | 1.3B | Fast | Low |
| Curie | 6.7B | Medium | Good |
| Davinci | 175B | Slowest | Best |
Choose based on task complexity vs cost/speed.
Prompt Engineering
The skill of crafting effective prompts:
Zero-Shot
Classify the sentiment of this review as positive or negative:
"This product exceeded my expectations!"
Sentiment:
Few-Shot
Classify sentiment:
"I love this!" => positive
"Terrible experience" => negative
"This product exceeded my expectations!" =>
Chain of Thought
Solve step by step:
If there are 3 cars in the parking lot and 2 more arrive,
how many cars are there?
Step 1: Start with 3 cars
Step 2: Add 2 arriving cars
Step 3: 3 + 2 = 5 cars
Answer: 5 cars
Limitations
Factual Accuracy
GPT-3 confidently generates falsehoods:
Q: Who was the first person on Mars?
A: Neil Armstrong was the first person to walk on Mars in 1969.
(Completely wrong)
It generates plausible text, not necessarily true text.
Context Window
Limited to ~4,000 tokens. Can’t process long documents.
Cost
Davinci at $0.02/1K tokens adds up quickly in production.
No Real Understanding
It predicts tokens, not concepts:
Q: What's heavier, a pound of feathers or a pound of steel?
(May fail this classic trick question)
Applications Being Built
Writing Assistants
- Jasper (formerly Jarvis)
- Copy.ai
- Writesonic
Code Assistants
- GitHub Copilot (uses Codex, GPT-3 derivative)
- Tabnine
Customer Support
Automated response drafting, FAQ answering.
Search and Question Answering
Perplexity, You.com using LLMs for search.
Ethical Concerns
Misinformation
Generates convincing fake text at scale.
Bias
Trained on internet data, inherits internet biases:
"The CEO walked into the room. He..."
(Assumes male CEO)
Energy Consumption
Training large models has significant carbon footprint.
Job Displacement
Writing, coding, customer service—roles potentially affected.
What This Means
GPT-3 demonstrates that scale works. More parameters + more data = more capabilities.
This isn’t AGI. But it’s a massive step. The ceiling raised significantly.
The next few years will be about:
- Cost reduction (smaller models with similar capability)
- Factual accuracy (grounding in knowledge bases)
- Responsible deployment (safety, bias mitigation)
Final Thoughts
GPT-3 is impressive but imperfect. It’s a tool, not magic.
The pattern is clear: more scale = more capability. GPT-4 will be larger. The pace won’t slow.
Learn prompt engineering. Understand limitations. Build responsibly.
The future of AI is being written, one token at a time.