Introduction to Vector Databases

ai databases

LLMs are powerful, but they have a knowledge cutoff. Vector databases solve this by enabling semantic search over your own data. Here’s how they work.

The Problem

User: What's our refund policy?
LLM: I don't have that information.

The LLM doesn’t know your company’s policies. You need to provide context.

The Solution: RAG

Retrieval-Augmented Generation:

User question

Convert to embedding vector

Search vector database for similar content

Return relevant documents

LLM + documents → Answer

What Are Embeddings

Embeddings convert text to vectors that capture meaning:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Text becomes vectors
text1 = "The cat sat on the mat"
text2 = "A feline rested on the rug"
text3 = "Stock prices rose today"

vec1 = model.encode(text1)  # [0.12, -0.34, 0.56, ...]
vec2 = model.encode(text2)  # [0.11, -0.32, 0.58, ...]
vec3 = model.encode(text3)  # [-0.45, 0.23, -0.12, ...]

# Similar meaning → similar vectors
similarity(vec1, vec2) → 0.89  # High
similarity(vec1, vec3) → 0.12  # Low

Vector Database Options

DatabaseHostedSelf-hostedBest For
PineconeProduction, managed
MilvusLarge scale, flexible
WeaviateGraphQL API, modules
QdrantRust-based, fast
ChromaDBDevelopment, simple
pgvectorPostgres users

ChromaDB (Simple Start)

import chromadb
from chromadb.config import Settings

# Initialize
client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents
collection.add(
    documents=[
        "Our refund policy allows returns within 30 days.",
        "Shipping takes 3-5 business days.",
        "Contact support at help@example.com."
    ],
    ids=["doc1", "doc2", "doc3"]
)

# Query
results = collection.query(
    query_texts=["How do I get my money back?"],
    n_results=1
)
# Returns: "Our refund policy allows returns within 30 days."

Pinecone (Production)

import pinecone
from sentence_transformers import SentenceTransformer

# Initialize
pinecone.init(api_key="your-key", environment="us-east-1-gcp")
index = pinecone.Index("my-index")

# Embed
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ["Document 1 content...", "Document 2 content..."]
embeddings = model.encode(texts)

# Upsert
index.upsert([
    ("id1", embeddings[0].tolist(), {"text": texts[0]}),
    ("id2", embeddings[1].tolist(), {"text": texts[1]})
])

# Query
query_embedding = model.encode(["What is the refund policy?"])
results = index.query(query_embedding[0].tolist(), top_k=3, include_metadata=True)

pgvector (PostgreSQL)

-- Enable extension
CREATE EXTENSION vector;

-- Create table
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(384)  -- 384 dimensions
);

-- Index for fast search
CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Insert
INSERT INTO documents (content, embedding) 
VALUES ('Our refund policy...', '[0.1, 0.2, ...]');

-- Query
SELECT content, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

RAG Pipeline

Complete Example

import openai
from sentence_transformers import SentenceTransformer
import chromadb

# Setup
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
client = chromadb.Client()
collection = client.get_or_create_collection("knowledge_base")

def add_documents(documents: list[str]):
    """Add documents to the knowledge base."""
    embeddings = embed_model.encode(documents).tolist()
    ids = [f"doc_{i}" for i in range(len(documents))]
    collection.add(embeddings=embeddings, documents=documents, ids=ids)

def answer_question(question: str) -> str:
    """Answer using RAG."""
    # 1. Embed question
    query_embedding = embed_model.encode([question])[0].tolist()
    
    # 2. Retrieve relevant documents
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=3
    )
    context = "\n".join(results['documents'][0])
    
    # 3. Generate answer with context
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": f"""Answer based on this context:
            
{context}

If the answer isn't in the context, say you don't know."""},
            {"role": "user", "content": question}
        ]
    )
    
    return response.choices[0].message.content

# Usage
add_documents([
    "Our company was founded in 2015 in San Francisco.",
    "We offer a 30-day money-back guarantee.",
    "Premium plans include priority support."
])

answer = answer_question("When was the company founded?")
# "The company was founded in 2015 in San Francisco."

Chunking Strategies

Large documents need splitting:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len
)

chunks = splitter.split_text(large_document)
# → ["chunk 1...", "chunk 2...", ...]

Chunking Options

StrategyUse Case
Fixed sizeSimple, general
SentenceNatural breakpoints
ParagraphCoherent ideas
SemanticMeaning-based

Combine vector and keyword search:

# Vector search
vector_results = vector_search(query_embedding, k=20)

# Keyword search  
keyword_results = keyword_search(query_text, k=20)

# Combine with reciprocal rank fusion
final_results = rrf_merge(vector_results, keyword_results)

Metadata Filtering

# Filter by metadata before vector search
results = collection.query(
    query_embeddings=[embedding],
    n_results=5,
    where={"category": "support", "date": {"$gt": "2023-01-01"}}
)

Performance Considerations

Indexing

AlgorithmSpeedRecallMemory
FlatSlow100%Low
IVFMedium95%+Medium
HNSWFast95%+High

Embedding Model Choice

ModelDimensionsQualitySpeed
all-MiniLM-L6-v2384GoodFast
all-mpnet-base-v2768BetterMedium
OpenAI Ada1536BestAPI call

Final Thoughts

Vector databases enable AI that knows your data. The pattern is simple:

  1. Embed your documents
  2. Store in vector database
  3. Query semantically
  4. Feed to LLM

This is the foundation of every “chat with your docs” application.


Vector databases: The memory layer for AI applications.

All posts