2023 Retrospective: The Year of Local AI

December 27, 2023

ai retrospective

2023 was the year AI escaped from the cloud. LLMs went from API-only to running on laptops. Models got smaller and better. Open source (mostly) won. Here’s what happened.

The Arc of 2023

January:   ChatGPT mania continues
February:  LLaMA leaks → local AI begins
March:     GPT-4 raises the bar
April:     Agents emerge (AutoGPT, BabyAGI)
May:       LangChain becomes default
July:      Llama 2 goes commercial
September: Mistral shows efficiency matters
November:  OpenAI DevDay → Gemini
December:  Local AI is mainstream

The Big Stories

1. Open-Weight Models Arrived

LLaMA        → Research starts
Llama 2      → Commercial starts
Mistral      → Efficiency starts
Mixtral      → Mixture of experts starts

By December, you could run competitive models locally.

2. The Context Window Race

GPT-3.5:     4K tokens
GPT-4:       8K/32K tokens
GPT-4 Turbo: 128K tokens
Claude:      100K tokens

More context = more capable applications.

3. Multimodal Everything

GPT-4V:      Text + images
Gemini:      Text + images + video + audio
LLaVA:       Open-source vision

Text-only was just the beginning.

4. RAG Became Standard

2022: "What's RAG?"
2023: "Of course we use RAG"

Every production LLM application uses retrieval.

Developer Experience

Tools Matured

Tool	What It Provided
LangChain	Framework
LlamaIndex	Data framework
Ollama	Easy local models
vLLM	Production serving
Text Gen WebUI	GUI interface

Hosting Options Exploded

Provider	What They Offer
Together AI	Open model APIs
Replicate	Model marketplace
Modal	GPU on demand
RunPod	GPU cloud
Hugging Face	Everything

My Projects This Year

1. Built with Local Models

Finally deployed production apps with local LLMs:

Document Q&A with Llama 2
Code review with CodeLlama
Internal chatbots with Mistral

2. Experimented with RAG

Moved from “it kind of works” to “it reliably works”:

Hybrid search (BM25 + vector)
Reranking
Chunk optimization

3. Tried Agents (Mixed Results)

Agents are powerful but fragile:

Simple chains work well
Complex autonomous agents fail often
Human-in-the-loop is usually necessary

What Surprised Me

1. Speed of Open Source

GPT-4 released: March 2023
Open models competitive: October 2023
Seven months.

The open-source community moves fast.

2. Efficiency Gains

March:      Need 65B+ parameters for quality
October:    7B parameters often sufficient

Mistral proved architecture matters more than size.

3. Enterprise Adoption

Early 2023: "Is this real?"
Late 2023:  "When's our AI strategy?"

Faster enterprise adoption than I expected.

What Disappointed Me

1. Agent Reliability

Demo:     "Look, it plans and executes autonomously!"
Reality:  Infinite loops, wrong actions, high costs

Agents need more guardrails.

2. Hallucination Persistence

Better models → Still hallucinate
More context  → Still hallucinate
Fine-tuning   → Still hallucinate

This isn’t solved yet.

3. OpenAI Drama

The board vs. Sam saga was a distraction. It highlighted the risks of single-vendor dependency.

Predictions for 2024

1. Local-First Will Grow

Privacy, cost, and latency will drive local deployment.

2. Smaller Models Will Dominate

The 7B-13B sweet spot will get better.

3. Agent Frameworks Will Mature

Less “autonomous AI” hype, more practical workflows.

4. Multimodal Becomes Standard

Text-only will feel limited.

5. Enterprise Integration

RAG + fine-tuning + guardrails = enterprise-ready.

For Developers

What to Learn

RAG architecture: It’s the production pattern
Local model deployment: Ollama, vLLM, llama.cpp
Prompt engineering: Still matters
Evaluation: How to measure quality

What to Build

Domain-specific assistants: Beat generic ChatGPT
Internal tools: Low-risk, high-value
Workflow automation: AI + existing processes

What to Avoid

Full autonomy: Start with human-in-the-loop
Pure chat interfaces: Usually not the right UX
Ignoring security: Prompt injection is real

Tools of the Year

MVP Stack

Local dev:  Ollama + LangChain
Production: OpenAI or Together + LangServe
Hosting:    Any cloud with GPUs

Experimentation Stack

Models:     Hugging Face
Serving:    vLLM or TGI
Interface:  Gradio
Notebook:   Jupyter

Technical Lessons

1. Chunking Matters

# Experiment with chunk size
for size in [200, 500, 1000]:
    retriever = build_retriever(chunk_size=size)
    score = evaluate(retriever)

There’s no universal best size.

2. Reranking Helps

# Two-stage retrieval
candidates = vector_search(k=20)
final = rerank(candidates, k=5)

Extra step, measurable improvement.

3. Prompts Are Fragile

Works in dev:    "Summarize this document"
Fails in prod:   "Summarize this document" (different document structure)

Test with diverse inputs.

Looking Forward

2023 democratized access to AI. 2024 will be about making it reliable.

The hype will moderate. The technology will mature. The useful applications will emerge from the experimentation.

We’re still early.

2023: The year AI became personal.