Code Interpreter: AI Running Its Own Code
In July 2023, OpenAI released Code Interpreter (later renamed Advanced Data Analysis) to ChatGPT Plus users. It gave GPT-4 the ability to write and execute Python code, fundamentally changing what was possible.
What It Does
User: Upload sales_data.csv and analyze quarterly trends
ChatGPT: [Writes Python code]
[Executes code]
[Returns analysis + visualizations]
The AI can:
- Write Python code
- Execute it in a sandbox
- Access uploaded files
- Generate outputs (graphs, processed files)
- Iterate when code fails
Why It Matters
Before Code Interpreter
User: What's the standard deviation of column A?
ChatGPT: I can't actually compute that. Here's the formula...
You would need to run: np.std(df['A'])
After Code Interpreter
User: What's the standard deviation of column A?
ChatGPT: [Runs: import pandas as pd; df = pd.read_csv(...); df['A'].std()]
The standard deviation of column A is 42.3
Use Cases
Data Analysis
User: [uploads dataset.csv]
Clean this data, remove outliers, and show me the distribution
ChatGPT:
```python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('dataset.csv')
# Clean missing values
df = df.dropna()
# Remove outliers using IQR
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) &
(df['value'] <= Q3 + 1.5*IQR)]
# Plot distribution
plt.figure(figsize=(10, 6))
plt.hist(df['value'], bins=50, edgecolor='black')
plt.title('Value Distribution (Cleaned)')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.savefig('distribution.png')
[Returns: distribution.png, cleaned summary statistics]
### File Conversion
User: Convert this PDF to Markdown
ChatGPT: [Uses PyPDF2 to extract text] [Formats as Markdown] [Returns output.md]
### Image Processing
User: [uploads image.jpg] Resize to 800px wide, add a watermark, convert to PNG
ChatGPT:
from PIL import Image, ImageDraw, ImageFont
img = Image.open('image.jpg')
# Resize maintaining aspect ratio
width = 800
ratio = width / img.width
height = int(img.height * ratio)
img = img.resize((width, height))
# Add watermark
draw = ImageDraw.Draw(img)
draw.text((10, height-30), "© 2023", fill=(255,255,255))
# Save as PNG
img.save('output.png', 'PNG')
[Returns downloadable output.png]
### Math and Visualization
User: Visualize the Mandelbrot set at high resolution
ChatGPT:
import numpy as np
import matplotlib.pyplot as plt
def mandelbrot(h, w, max_iter):
y, x = np.ogrid[-1.4:1.4:h*1j, -2:0.8:w*1j]
c = x + y*1j
z = c
divtime = max_iter + np.zeros(z.shape, dtype=int)
for i in range(max_iter):
z = z**2 + c
diverge = z*np.conj(z) > 2**2
div_now = diverge & (divtime == max_iter)
divtime[div_now] = i
z[diverge] = 2
return divtime
plt.figure(figsize=(12, 10))
plt.imshow(mandelbrot(1000, 1500, 80), cmap='magma')
plt.axis('off')
plt.savefig('mandelbrot.png', dpi=150, bbox_inches='tight')
[Returns high-resolution Mandelbrot visualization]
## Technical Details
### The Sandbox
- Python 3.10 environment
- Standard data science libraries (pandas, numpy, matplotlib, etc.)
- Sandboxed execution
- CPU only (no GPU)
- Time and memory limits
- No network access
### Available Libraries
```python
# Pre-installed
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import scipy
import sklearn
import sympy
import networkx
# and many more...
Limitations
No internet access → Can't fetch URLs
No persistent storage → Each session is fresh
60-second timeout → Long computations fail
Memory limits → Very large datasets may fail
No GPU → No deep learning inference
Implications for Developers
Code as Tool
Traditional: LLM generates code for human to run
Code Interpreter: LLM generates and runs code itself
This enables a new class of AI applications.
Trust but Verify
User: Calculate my tax liability
ChatGPT: [Runs calculations]
Your estimated tax is $12,345
[Shows code used]
[Shows intermediate values]
You can see the work, but should still verify.
The Agentic Pattern
# What Code Interpreter does internally
for attempt in range(max_attempts):
code = generate_code(task, context, errors)
result = execute(code)
if result.success:
return result.output
else:
errors.append(result.error)
context += f"\nAttempt {attempt} failed: {result.error}"
When code fails, it tries again with the error message.
Building Similar Tools
Local Execution
import subprocess
import tempfile
def execute_python(code: str) -> str:
with tempfile.NamedTemporaryFile(suffix='.py', delete=False) as f:
f.write(code.encode())
f.flush()
result = subprocess.run(
['python', f.name],
capture_output=True,
text=True,
timeout=60
)
return result.stdout if result.returncode == 0 else result.stderr
Safer: Docker Sandbox
import docker
client = docker.from_env()
def execute_safely(code: str) -> str:
container = client.containers.run(
'python:3.10-slim',
f'python -c "{code}"',
remove=True,
mem_limit='512m',
network_disabled=True,
timeout=30
)
return container.decode()
With LLM Integration
def solve_with_code(problem: str) -> str:
# Generate code
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Write Python code to solve this. Output only code."},
{"role": "user", "content": problem}
]
)
code = response.choices[0].message.content
# Execute
result = execute_safely(code)
# Explain result
explanation = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "user", "content": f"The code produced: {result}\nExplain this result."}
]
)
return explanation.choices[0].message.content
Final Thoughts
Code Interpreter showed that LLMs + code execution is a powerful combination. The AI can:
- Verify its own answers
- Handle complex computations
- Process real data
- Iterate on failures
It’s not perfect—you still need to check the work. But it’s a glimpse of how AI tools will evolve.
AI that can check its own homework.