Introduction to Vector Databases
ai databases
LLMs are powerful, but they have a knowledge cutoff. Vector databases solve this by enabling semantic search over your own data. Here’s how they work.
The Problem
User: What's our refund policy?
LLM: I don't have that information.
The LLM doesn’t know your company’s policies. You need to provide context.
The Solution: RAG
Retrieval-Augmented Generation:
User question
↓
Convert to embedding vector
↓
Search vector database for similar content
↓
Return relevant documents
↓
LLM + documents → Answer
What Are Embeddings
Embeddings convert text to vectors that capture meaning:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Text becomes vectors
text1 = "The cat sat on the mat"
text2 = "A feline rested on the rug"
text3 = "Stock prices rose today"
vec1 = model.encode(text1) # [0.12, -0.34, 0.56, ...]
vec2 = model.encode(text2) # [0.11, -0.32, 0.58, ...]
vec3 = model.encode(text3) # [-0.45, 0.23, -0.12, ...]
# Similar meaning → similar vectors
similarity(vec1, vec2) → 0.89 # High
similarity(vec1, vec3) → 0.12 # Low
Vector Database Options
| Database | Hosted | Self-hosted | Best For |
|---|---|---|---|
| Pinecone | ✅ | ❌ | Production, managed |
| Milvus | ✅ | ✅ | Large scale, flexible |
| Weaviate | ✅ | ✅ | GraphQL API, modules |
| Qdrant | ✅ | ✅ | Rust-based, fast |
| ChromaDB | ❌ | ✅ | Development, simple |
| pgvector | ❌ | ✅ | Postgres users |
ChromaDB (Simple Start)
import chromadb
from chromadb.config import Settings
# Initialize
client = chromadb.Client()
collection = client.create_collection("my_docs")
# Add documents
collection.add(
documents=[
"Our refund policy allows returns within 30 days.",
"Shipping takes 3-5 business days.",
"Contact support at help@example.com."
],
ids=["doc1", "doc2", "doc3"]
)
# Query
results = collection.query(
query_texts=["How do I get my money back?"],
n_results=1
)
# Returns: "Our refund policy allows returns within 30 days."
Pinecone (Production)
import pinecone
from sentence_transformers import SentenceTransformer
# Initialize
pinecone.init(api_key="your-key", environment="us-east-1-gcp")
index = pinecone.Index("my-index")
# Embed
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ["Document 1 content...", "Document 2 content..."]
embeddings = model.encode(texts)
# Upsert
index.upsert([
("id1", embeddings[0].tolist(), {"text": texts[0]}),
("id2", embeddings[1].tolist(), {"text": texts[1]})
])
# Query
query_embedding = model.encode(["What is the refund policy?"])
results = index.query(query_embedding[0].tolist(), top_k=3, include_metadata=True)
pgvector (PostgreSQL)
-- Enable extension
CREATE EXTENSION vector;
-- Create table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(384) -- 384 dimensions
);
-- Index for fast search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Insert
INSERT INTO documents (content, embedding)
VALUES ('Our refund policy...', '[0.1, 0.2, ...]');
-- Query
SELECT content, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
RAG Pipeline
Complete Example
import openai
from sentence_transformers import SentenceTransformer
import chromadb
# Setup
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
client = chromadb.Client()
collection = client.get_or_create_collection("knowledge_base")
def add_documents(documents: list[str]):
"""Add documents to the knowledge base."""
embeddings = embed_model.encode(documents).tolist()
ids = [f"doc_{i}" for i in range(len(documents))]
collection.add(embeddings=embeddings, documents=documents, ids=ids)
def answer_question(question: str) -> str:
"""Answer using RAG."""
# 1. Embed question
query_embedding = embed_model.encode([question])[0].tolist()
# 2. Retrieve relevant documents
results = collection.query(
query_embeddings=[query_embedding],
n_results=3
)
context = "\n".join(results['documents'][0])
# 3. Generate answer with context
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": f"""Answer based on this context:
{context}
If the answer isn't in the context, say you don't know."""},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Usage
add_documents([
"Our company was founded in 2015 in San Francisco.",
"We offer a 30-day money-back guarantee.",
"Premium plans include priority support."
])
answer = answer_question("When was the company founded?")
# "The company was founded in 2015 in San Francisco."
Chunking Strategies
Large documents need splitting:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
length_function=len
)
chunks = splitter.split_text(large_document)
# → ["chunk 1...", "chunk 2...", ...]
Chunking Options
| Strategy | Use Case |
|---|---|
| Fixed size | Simple, general |
| Sentence | Natural breakpoints |
| Paragraph | Coherent ideas |
| Semantic | Meaning-based |
Hybrid Search
Combine vector and keyword search:
# Vector search
vector_results = vector_search(query_embedding, k=20)
# Keyword search
keyword_results = keyword_search(query_text, k=20)
# Combine with reciprocal rank fusion
final_results = rrf_merge(vector_results, keyword_results)
Metadata Filtering
# Filter by metadata before vector search
results = collection.query(
query_embeddings=[embedding],
n_results=5,
where={"category": "support", "date": {"$gt": "2023-01-01"}}
)
Performance Considerations
Indexing
| Algorithm | Speed | Recall | Memory |
|---|---|---|---|
| Flat | Slow | 100% | Low |
| IVF | Medium | 95%+ | Medium |
| HNSW | Fast | 95%+ | High |
Embedding Model Choice
| Model | Dimensions | Quality | Speed |
|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | Good | Fast |
| all-mpnet-base-v2 | 768 | Better | Medium |
| OpenAI Ada | 1536 | Best | API call |
Final Thoughts
Vector databases enable AI that knows your data. The pattern is simple:
- Embed your documents
- Store in vector database
- Query semantically
- Feed to LLM
This is the foundation of every “chat with your docs” application.
Vector databases: The memory layer for AI applications.