Code Interpreter: AI Running Its Own Code

July 8, 2023

ai dev

In July 2023, OpenAI released Code Interpreter (later renamed Advanced Data Analysis) to ChatGPT Plus users. It gave GPT-4 the ability to write and execute Python code, fundamentally changing what was possible.

What It Does

User: Upload sales_data.csv and analyze quarterly trends

ChatGPT: [Writes Python code]
         [Executes code]
         [Returns analysis + visualizations]

The AI can:

Write Python code
Execute it in a sandbox
Access uploaded files
Generate outputs (graphs, processed files)
Iterate when code fails

Why It Matters

Before Code Interpreter

User: What's the standard deviation of column A?
ChatGPT: I can't actually compute that. Here's the formula...
         You would need to run: np.std(df['A'])

After Code Interpreter

User: What's the standard deviation of column A?
ChatGPT: [Runs: import pandas as pd; df = pd.read_csv(...); df['A'].std()]
         The standard deviation of column A is 42.3

Use Cases

Data Analysis

User: [uploads dataset.csv]
      Clean this data, remove outliers, and show me the distribution

ChatGPT:
```python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('dataset.csv')

# Clean missing values
df = df.dropna()

# Remove outliers using IQR
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) & 
        (df['value'] <= Q3 + 1.5*IQR)]

# Plot distribution
plt.figure(figsize=(10, 6))
plt.hist(df['value'], bins=50, edgecolor='black')
plt.title('Value Distribution (Cleaned)')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.savefig('distribution.png')

[Returns: distribution.png, cleaned summary statistics]


### File Conversion

User: Convert this PDF to Markdown

ChatGPT: [Uses PyPDF2 to extract text] [Formats as Markdown] [Returns output.md]


### Image Processing

User: [uploads image.jpg] Resize to 800px wide, add a watermark, convert to PNG

ChatGPT:

from PIL import Image, ImageDraw, ImageFont

img = Image.open('image.jpg')

# Resize maintaining aspect ratio
width = 800
ratio = width / img.width
height = int(img.height * ratio)
img = img.resize((width, height))

# Add watermark
draw = ImageDraw.Draw(img)
draw.text((10, height-30), "© 2023", fill=(255,255,255))

# Save as PNG
img.save('output.png', 'PNG')

[Returns downloadable output.png]


### Math and Visualization

User: Visualize the Mandelbrot set at high resolution

ChatGPT:

import numpy as np
import matplotlib.pyplot as plt

def mandelbrot(h, w, max_iter):
    y, x = np.ogrid[-1.4:1.4:h*1j, -2:0.8:w*1j]
    c = x + y*1j
    z = c
    divtime = max_iter + np.zeros(z.shape, dtype=int)
    
    for i in range(max_iter):
        z = z**2 + c
        diverge = z*np.conj(z) > 2**2
        div_now = diverge & (divtime == max_iter)
        divtime[div_now] = i
        z[diverge] = 2
    
    return divtime

plt.figure(figsize=(12, 10))
plt.imshow(mandelbrot(1000, 1500, 80), cmap='magma')
plt.axis('off')
plt.savefig('mandelbrot.png', dpi=150, bbox_inches='tight')

[Returns high-resolution Mandelbrot visualization]


## Technical Details

### The Sandbox

- Python 3.10 environment
- Standard data science libraries (pandas, numpy, matplotlib, etc.)
- Sandboxed execution
- CPU only (no GPU)
- Time and memory limits
- No network access

### Available Libraries

```python
# Pre-installed
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import scipy
import sklearn
import sympy
import networkx
# and many more...

Limitations

No internet access      → Can't fetch URLs
No persistent storage   → Each session is fresh
60-second timeout       → Long computations fail
Memory limits           → Very large datasets may fail
No GPU                  → No deep learning inference

Implications for Developers

Code as Tool

Traditional: LLM generates code for human to run
Code Interpreter: LLM generates and runs code itself

This enables a new class of AI applications.

Trust but Verify

User: Calculate my tax liability
ChatGPT: [Runs calculations]
         Your estimated tax is $12,345
         
         [Shows code used]
         [Shows intermediate values]

You can see the work, but should still verify.

The Agentic Pattern

# What Code Interpreter does internally
for attempt in range(max_attempts):
    code = generate_code(task, context, errors)
    result = execute(code)
    
    if result.success:
        return result.output
    else:
        errors.append(result.error)
        context += f"\nAttempt {attempt} failed: {result.error}"

When code fails, it tries again with the error message.

Building Similar Tools

Local Execution

import subprocess
import tempfile

def execute_python(code: str) -> str:
    with tempfile.NamedTemporaryFile(suffix='.py', delete=False) as f:
        f.write(code.encode())
        f.flush()
        
        result = subprocess.run(
            ['python', f.name],
            capture_output=True,
            text=True,
            timeout=60
        )
        
        return result.stdout if result.returncode == 0 else result.stderr

Safer: Docker Sandbox

import docker

client = docker.from_env()

def execute_safely(code: str) -> str:
    container = client.containers.run(
        'python:3.10-slim',
        f'python -c "{code}"',
        remove=True,
        mem_limit='512m',
        network_disabled=True,
        timeout=30
    )
    return container.decode()

With LLM Integration

def solve_with_code(problem: str) -> str:
    # Generate code
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Write Python code to solve this. Output only code."},
            {"role": "user", "content": problem}
        ]
    )
    code = response.choices[0].message.content
    
    # Execute
    result = execute_safely(code)
    
    # Explain result
    explanation = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "user", "content": f"The code produced: {result}\nExplain this result."}
        ]
    )
    
    return explanation.choices[0].message.content

Final Thoughts

Code Interpreter showed that LLMs + code execution is a powerful combination. The AI can:

Verify its own answers
Handle complex computations
Process real data
Iterate on failures

It’s not perfect—you still need to check the work. But it’s a glimpse of how AI tools will evolve.

AI that can check its own homework.