2023 Retrospective: The Year of Local AI

ai retrospective

2023 was the year AI escaped from the cloud. LLMs went from API-only to running on laptops. Models got smaller and better. Open source (mostly) won. Here’s what happened.

The Arc of 2023

January:   ChatGPT mania continues
February:  LLaMA leaks → local AI begins
March:     GPT-4 raises the bar
April:     Agents emerge (AutoGPT, BabyAGI)
May:       LangChain becomes default
July:      Llama 2 goes commercial
September: Mistral shows efficiency matters
November:  OpenAI DevDay → Gemini
December:  Local AI is mainstream

The Big Stories

1. Open-Weight Models Arrived

LLaMA        → Research starts
Llama 2      → Commercial starts
Mistral      → Efficiency starts
Mixtral      → Mixture of experts starts

By December, you could run competitive models locally.

2. The Context Window Race

GPT-3.5:     4K tokens
GPT-4:       8K/32K tokens
GPT-4 Turbo: 128K tokens
Claude:      100K tokens

More context = more capable applications.

3. Multimodal Everything

GPT-4V:      Text + images
Gemini:      Text + images + video + audio
LLaVA:       Open-source vision

Text-only was just the beginning.

4. RAG Became Standard

2022: "What's RAG?"
2023: "Of course we use RAG"

Every production LLM application uses retrieval.

Developer Experience

Tools Matured

ToolWhat It Provided
LangChainFramework
LlamaIndexData framework
OllamaEasy local models
vLLMProduction serving
Text Gen WebUIGUI interface

Hosting Options Exploded

ProviderWhat They Offer
Together AIOpen model APIs
ReplicateModel marketplace
ModalGPU on demand
RunPodGPU cloud
Hugging FaceEverything

My Projects This Year

1. Built with Local Models

Finally deployed production apps with local LLMs:

2. Experimented with RAG

Moved from “it kind of works” to “it reliably works”:

3. Tried Agents (Mixed Results)

Agents are powerful but fragile:

What Surprised Me

1. Speed of Open Source

GPT-4 released: March 2023
Open models competitive: October 2023
Seven months.

The open-source community moves fast.

2. Efficiency Gains

March:      Need 65B+ parameters for quality
October:    7B parameters often sufficient

Mistral proved architecture matters more than size.

3. Enterprise Adoption

Early 2023: "Is this real?"
Late 2023:  "When's our AI strategy?"

Faster enterprise adoption than I expected.

What Disappointed Me

1. Agent Reliability

Demo:     "Look, it plans and executes autonomously!"
Reality:  Infinite loops, wrong actions, high costs

Agents need more guardrails.

2. Hallucination Persistence

Better models → Still hallucinate
More context  → Still hallucinate
Fine-tuning   → Still hallucinate

This isn’t solved yet.

3. OpenAI Drama

The board vs. Sam saga was a distraction. It highlighted the risks of single-vendor dependency.

Predictions for 2024

1. Local-First Will Grow

Privacy, cost, and latency will drive local deployment.

2. Smaller Models Will Dominate

The 7B-13B sweet spot will get better.

3. Agent Frameworks Will Mature

Less “autonomous AI” hype, more practical workflows.

4. Multimodal Becomes Standard

Text-only will feel limited.

5. Enterprise Integration

RAG + fine-tuning + guardrails = enterprise-ready.

For Developers

What to Learn

  1. RAG architecture: It’s the production pattern
  2. Local model deployment: Ollama, vLLM, llama.cpp
  3. Prompt engineering: Still matters
  4. Evaluation: How to measure quality

What to Build

  1. Domain-specific assistants: Beat generic ChatGPT
  2. Internal tools: Low-risk, high-value
  3. Workflow automation: AI + existing processes

What to Avoid

  1. Full autonomy: Start with human-in-the-loop
  2. Pure chat interfaces: Usually not the right UX
  3. Ignoring security: Prompt injection is real

Tools of the Year

MVP Stack

Local dev:  Ollama + LangChain
Production: OpenAI or Together + LangServe
Hosting:    Any cloud with GPUs

Experimentation Stack

Models:     Hugging Face
Serving:    vLLM or TGI
Interface:  Gradio
Notebook:   Jupyter

Technical Lessons

1. Chunking Matters

# Experiment with chunk size
for size in [200, 500, 1000]:
    retriever = build_retriever(chunk_size=size)
    score = evaluate(retriever)

There’s no universal best size.

2. Reranking Helps

# Two-stage retrieval
candidates = vector_search(k=20)
final = rerank(candidates, k=5)

Extra step, measurable improvement.

3. Prompts Are Fragile

Works in dev:    "Summarize this document"
Fails in prod:   "Summarize this document" (different document structure)

Test with diverse inputs.

Looking Forward

2023 democratized access to AI. 2024 will be about making it reliable.

The hype will moderate. The technology will mature. The useful applications will emerge from the experimentation.

We’re still early.


2023: The year AI became personal.

All posts