OpenTelemetry: The Standard for Observability

October 27, 2024

devops observability

OpenTelemetry has become the standard for observability. Vendor-neutral instrumentation that works everywhere. Here’s how to use it effectively.

What Is OpenTelemetry

OpenTelemetry (OTel) provides:

APIs: For instrumenting code
SDKs: For collecting telemetry
Collector: For processing and exporting
Protocol (OTLP): Standard wire format

The Three Pillars

Traces:  Request flow across services
Metrics: Aggregated measurements over time
Logs:    Discrete events with context

OTel unifies all three.

Python Setup

Installation

pip install opentelemetry-api \
            opentelemetry-sdk \
            opentelemetry-instrumentation-django \
            opentelemetry-instrumentation-requests \
            opentelemetry-exporter-otlp

Basic Configuration

# otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

# Create resource
resource = Resource.create({
    "service.name": "my-service",
    "service.version": "1.0.0",
    "deployment.environment": "production"
})

# Create tracer provider
provider = TracerProvider(resource=resource)

# Add exporter
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

# Set global tracer provider
trace.set_tracer_provider(provider)

Django Auto-instrumentation

# manage.py
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')

from opentelemetry.instrumentation.django import DjangoInstrumentor
DjangoInstrumentor().instrument()

# Continue with Django setup

Manual Instrumentation

Creating Spans

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def process_order(order_id):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order_id)
        
        # Child span
        with tracer.start_as_current_span("validate_order"):
            validate(order_id)
        
        with tracer.start_as_current_span("charge_payment"):
            charge(order_id)
            
        span.set_attribute("order.status", "completed")

Context Propagation

from opentelemetry import trace
from opentelemetry.propagate import inject, extract

# Inject context into outgoing request
headers = {}
inject(headers)
requests.get("http://other-service/api", headers=headers)

# Extract context from incoming request
context = extract(request.headers)
with tracer.start_as_current_span("handle_request", context=context):
    # Process with parent context
    pass

Metrics

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

# Setup
exporter = OTLPMetricExporter(endpoint="http://localhost:4317")
reader = PeriodicExportingMetricReader(exporter)
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)

# Create meter
meter = metrics.get_meter(__name__)

# Counter
request_counter = meter.create_counter(
    "http_requests_total",
    description="Total HTTP requests"
)

# Histogram
request_duration = meter.create_histogram(
    "http_request_duration_seconds",
    description="HTTP request duration"
)

# Use
def handle_request(request):
    request_counter.add(1, {"method": request.method, "path": request.path})
    
    start = time.time()
    response = process(request)
    duration = time.time() - start
    
    request_duration.record(duration, {"method": request.method})
    return response

Logs (Correlation)

from opentelemetry import trace
import logging

class OTelLogHandler(logging.Handler):
    def emit(self, record):
        span = trace.get_current_span()
        if span.is_recording():
            ctx = span.get_span_context()
            record.trace_id = format(ctx.trace_id, '032x')
            record.span_id = format(ctx.span_id, '016x')

# Add to logger
handler = OTelLogHandler()
logging.getLogger().addHandler(handler)

# Format logs with trace context
formatter = logging.Formatter(
    '%(asctime)s [%(trace_id)s:%(span_id)s] %(message)s'
)

The Collector

Docker Compose

version: '3'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP

Collector Config

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1000
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 500

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true
  
  prometheus:
    endpoint: "0.0.0.0:8889"
  
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [jaeger]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Kubernetes Integration

Auto-instrumentation Operator

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest

# Pod annotation
annotations:
  instrumentation.opentelemetry.io/inject-python: "true"

Auto-instrumentedauto-instrumented without code changes.

Best Practices

1. Resource Attributes

resource = Resource.create({
    "service.name": "payment-service",
    "service.version": "1.2.3",
    "deployment.environment": "production",
    "service.instance.id": socket.gethostname(),
})

2. Semantic Conventions

# Use standard attribute names
span.set_attribute("http.method", "POST")
span.set_attribute("http.url", url)
span.set_attribute("http.status_code", 200)
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.statement", query)

3. Sampling

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased

sampler = TraceIdRatioBased(0.1)  # Sample 10% of traces
provider = TracerProvider(sampler=sampler, resource=resource)

4. Error Handling

try:
    result = operation()
except Exception as e:
    span.set_status(Status(StatusCode.ERROR, str(e)))
    span.record_exception(e)
    raise

Vendor Export

Send to any backend:

exporters:
  jaeger: ...
  zipkin: ...
  datadog: ...
  newrelic: ...
  honeycomb: ...
  lightstep: ...

One instrumentation, many destinations.

Final Thoughts

OpenTelemetry ends vendor lock-in for observability. Instrument once with standard APIs, export anywhere.

The ecosystem is mature. It’s the right choice for new projects and worth migrating legacy instrumentation.

Observe everything, lock into nothing.