2023 Retrospective: The Year of Local AI
2023 was the year AI escaped from the cloud. LLMs went from API-only to running on laptops. Models got smaller and better. Open source (mostly) won. Here’s what happened.
The Arc of 2023
January: ChatGPT mania continues
February: LLaMA leaks → local AI begins
March: GPT-4 raises the bar
April: Agents emerge (AutoGPT, BabyAGI)
May: LangChain becomes default
July: Llama 2 goes commercial
September: Mistral shows efficiency matters
November: OpenAI DevDay → Gemini
December: Local AI is mainstream
The Big Stories
1. Open-Weight Models Arrived
LLaMA → Research starts
Llama 2 → Commercial starts
Mistral → Efficiency starts
Mixtral → Mixture of experts starts
By December, you could run competitive models locally.
2. The Context Window Race
GPT-3.5: 4K tokens
GPT-4: 8K/32K tokens
GPT-4 Turbo: 128K tokens
Claude: 100K tokens
More context = more capable applications.
3. Multimodal Everything
GPT-4V: Text + images
Gemini: Text + images + video + audio
LLaVA: Open-source vision
Text-only was just the beginning.
4. RAG Became Standard
2022: "What's RAG?"
2023: "Of course we use RAG"
Every production LLM application uses retrieval.
Developer Experience
Tools Matured
| Tool | What It Provided |
|---|---|
| LangChain | Framework |
| LlamaIndex | Data framework |
| Ollama | Easy local models |
| vLLM | Production serving |
| Text Gen WebUI | GUI interface |
Hosting Options Exploded
| Provider | What They Offer |
|---|---|
| Together AI | Open model APIs |
| Replicate | Model marketplace |
| Modal | GPU on demand |
| RunPod | GPU cloud |
| Hugging Face | Everything |
My Projects This Year
1. Built with Local Models
Finally deployed production apps with local LLMs:
- Document Q&A with Llama 2
- Code review with CodeLlama
- Internal chatbots with Mistral
2. Experimented with RAG
Moved from “it kind of works” to “it reliably works”:
- Hybrid search (BM25 + vector)
- Reranking
- Chunk optimization
3. Tried Agents (Mixed Results)
Agents are powerful but fragile:
- Simple chains work well
- Complex autonomous agents fail often
- Human-in-the-loop is usually necessary
What Surprised Me
1. Speed of Open Source
GPT-4 released: March 2023
Open models competitive: October 2023
Seven months.
The open-source community moves fast.
2. Efficiency Gains
March: Need 65B+ parameters for quality
October: 7B parameters often sufficient
Mistral proved architecture matters more than size.
3. Enterprise Adoption
Early 2023: "Is this real?"
Late 2023: "When's our AI strategy?"
Faster enterprise adoption than I expected.
What Disappointed Me
1. Agent Reliability
Demo: "Look, it plans and executes autonomously!"
Reality: Infinite loops, wrong actions, high costs
Agents need more guardrails.
2. Hallucination Persistence
Better models → Still hallucinate
More context → Still hallucinate
Fine-tuning → Still hallucinate
This isn’t solved yet.
3. OpenAI Drama
The board vs. Sam saga was a distraction. It highlighted the risks of single-vendor dependency.
Predictions for 2024
1. Local-First Will Grow
Privacy, cost, and latency will drive local deployment.
2. Smaller Models Will Dominate
The 7B-13B sweet spot will get better.
3. Agent Frameworks Will Mature
Less “autonomous AI” hype, more practical workflows.
4. Multimodal Becomes Standard
Text-only will feel limited.
5. Enterprise Integration
RAG + fine-tuning + guardrails = enterprise-ready.
For Developers
What to Learn
- RAG architecture: It’s the production pattern
- Local model deployment: Ollama, vLLM, llama.cpp
- Prompt engineering: Still matters
- Evaluation: How to measure quality
What to Build
- Domain-specific assistants: Beat generic ChatGPT
- Internal tools: Low-risk, high-value
- Workflow automation: AI + existing processes
What to Avoid
- Full autonomy: Start with human-in-the-loop
- Pure chat interfaces: Usually not the right UX
- Ignoring security: Prompt injection is real
Tools of the Year
MVP Stack
Local dev: Ollama + LangChain
Production: OpenAI or Together + LangServe
Hosting: Any cloud with GPUs
Experimentation Stack
Models: Hugging Face
Serving: vLLM or TGI
Interface: Gradio
Notebook: Jupyter
Technical Lessons
1. Chunking Matters
# Experiment with chunk size
for size in [200, 500, 1000]:
retriever = build_retriever(chunk_size=size)
score = evaluate(retriever)
There’s no universal best size.
2. Reranking Helps
# Two-stage retrieval
candidates = vector_search(k=20)
final = rerank(candidates, k=5)
Extra step, measurable improvement.
3. Prompts Are Fragile
Works in dev: "Summarize this document"
Fails in prod: "Summarize this document" (different document structure)
Test with diverse inputs.
Looking Forward
2023 democratized access to AI. 2024 will be about making it reliable.
The hype will moderate. The technology will mature. The useful applications will emerge from the experimentation.
We’re still early.
2023: The year AI became personal.