GPT-4: Multimodal Reasoning Arrives
On March 14, 2023, OpenAI released GPT-4. The leap from GPT-3.5 was significant: better reasoning, longer context, and vision capabilities. The bar moved again.
What’s New
Multimodal Input
GPT-4 can see:
User: [image of a refrigerator full of ingredients]
What can I make for dinner?
GPT-4: I can see eggs, cheese, bell peppers, and spinach.
You could make a vegetable frittata or an omelette.
Improved Reasoning
GPT-4 performs significantly better on exams:
| Exam | GPT-3.5 | GPT-4 |
|---|---|---|
| Bar Exam | 10th percentile | 90th percentile |
| SAT Math | 590 | 710 |
| LSAT | 40th percentile | 88th percentile |
| AP Calculus | 2 | 4 |
Longer Context
| Model | Context Window |
|---|---|
| GPT-3.5 | 4K tokens |
| GPT-4 | 8K tokens |
| GPT-4-32k | 32K tokens |
That’s roughly 50 pages of text in context.
Using the API
import openai
# Text-only
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
# With image (GPT-4V)
response = openai.ChatCompletion.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]
}
]
)
Code Capabilities
Complex Refactoring
User: Refactor this 200-line function into smaller, testable units.
Also add type hints and docstrings.
GPT-4: [Produces well-structured refactored code with explanations]
Understanding Codebases
User: Here's a Django view, serializer, and model. Find the bug
that causes N+1 queries.
GPT-4: The issue is in line 15 of the view. You're accessing
author.name in a loop without select_related. Here's the fix...
Architecture Discussions
User: Design a message queue system for a microservices architecture
handling 1M messages/hour.
GPT-4: [Detailed architecture with trade-offs, diagrams explained,
technology recommendations]
Limitations
Hallucinations Still Happen
User: What papers did Sangeet Verma publish in 2022?
GPT-4: [Confidently makes up paper titles and journals]
Better than GPT-3.5, but not solved.
Knowledge Cutoff
Training data still has an end date. GPT-4’s cutoff was September 2021 at launch.
Slower and More Expensive
| Model | Speed | Cost (input) | Cost (output) |
|---|---|---|---|
| GPT-3.5-turbo | Fast | $0.0015/1K | $0.002/1K |
| GPT-4 | Slower | $0.03/1K | $0.06/1K |
| GPT-4-32k | Slowest | $0.06/1K | $0.12/1K |
20x more expensive than GPT-3.5.
Developer Implications
What Changed
- Complex tasks become viable: Multi-step reasoning that failed with GPT-3.5 works
- Code understanding improves: Can handle larger, more complex codebases
- Vision integration: Screen reading, diagram understanding, document analysis
New Patterns
# Long document analysis
with open("contract.txt") as f:
contract = f.read() # Can now be 30+ pages
response = openai.ChatCompletion.create(
model="gpt-4-32k",
messages=[
{"role": "user", "content": f"""
Analyze this contract for risks:
{contract}
Provide:
1. Key terms summary
2. Potential risks
3. Unusual clauses
"""}
]
)
Vision Use Cases
# Code review from screenshot
review = analyze_image(
image="screenshot_of_code.png",
prompt="Review this code for issues. What would you change?"
)
# UI testing
issues = analyze_image(
image="app_screenshot.png",
prompt="Identify any UI/UX issues in this interface."
)
# Architecture diagrams
explanation = analyze_image(
image="system_diagram.png",
prompt="Explain this system architecture and identify potential bottlenecks."
)
Compared to Open Source
| Capability | GPT-4 | LLaMA-65B | Mistral-7B |
|---|---|---|---|
| Reasoning | Best | Good | Good |
| Coding | Best | Good | Good |
| Vision | Yes | No | No |
| Local | No | Yes | Yes |
| Cost | $$$ | Free | Free |
GPT-4 is best, but open models are catching up.
Practical Advice
When to Use GPT-4
- Complex reasoning tasks
- Code generation requiring understanding
- Vision-based analysis
- When quality matters more than cost
When GPT-3.5 is Fine
- Simple Q&A
- Straightforward text generation
- High-volume, low-stakes applications
- When cost matters
Hybrid Approach
def get_response(query, complexity):
if complexity == "high":
model = "gpt-4"
else:
model = "gpt-3.5-turbo"
return openai.ChatCompletion.create(model=model, messages=[...])
Use GPT-4 when you need it, GPT-3.5 when you don’t.
The Takeaway
GPT-4 represented a meaningful capability jump. Not quite AGI, but clearly more capable than its predecessors across almost every benchmark.
For developers, it opened new categories of applications:
- Document analysis at scale
- Vision-based automation
- Complex multi-step AI workflows
The bar will keep moving. Today’s state-of-the-art is tomorrow’s baseline.
March 2023: The new benchmark was set.