OpenTelemetry: The Standard for Observability

devops observability

OpenTelemetry has become the standard for observability. Vendor-neutral instrumentation that works everywhere. Here’s how to use it effectively.

What Is OpenTelemetry

OpenTelemetry (OTel) provides:

The Three Pillars

Traces:  Request flow across services
Metrics: Aggregated measurements over time
Logs:    Discrete events with context

OTel unifies all three.

Python Setup

Installation

pip install opentelemetry-api \
            opentelemetry-sdk \
            opentelemetry-instrumentation-django \
            opentelemetry-instrumentation-requests \
            opentelemetry-exporter-otlp

Basic Configuration

# otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

# Create resource
resource = Resource.create({
    "service.name": "my-service",
    "service.version": "1.0.0",
    "deployment.environment": "production"
})

# Create tracer provider
provider = TracerProvider(resource=resource)

# Add exporter
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

# Set global tracer provider
trace.set_tracer_provider(provider)

Django Auto-instrumentation

# manage.py
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')

from opentelemetry.instrumentation.django import DjangoInstrumentor
DjangoInstrumentor().instrument()

# Continue with Django setup

Manual Instrumentation

Creating Spans

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def process_order(order_id):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order_id)
        
        # Child span
        with tracer.start_as_current_span("validate_order"):
            validate(order_id)
        
        with tracer.start_as_current_span("charge_payment"):
            charge(order_id)
            
        span.set_attribute("order.status", "completed")

Context Propagation

from opentelemetry import trace
from opentelemetry.propagate import inject, extract

# Inject context into outgoing request
headers = {}
inject(headers)
requests.get("http://other-service/api", headers=headers)

# Extract context from incoming request
context = extract(request.headers)
with tracer.start_as_current_span("handle_request", context=context):
    # Process with parent context
    pass

Metrics

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

# Setup
exporter = OTLPMetricExporter(endpoint="http://localhost:4317")
reader = PeriodicExportingMetricReader(exporter)
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)

# Create meter
meter = metrics.get_meter(__name__)

# Counter
request_counter = meter.create_counter(
    "http_requests_total",
    description="Total HTTP requests"
)

# Histogram
request_duration = meter.create_histogram(
    "http_request_duration_seconds",
    description="HTTP request duration"
)

# Use
def handle_request(request):
    request_counter.add(1, {"method": request.method, "path": request.path})
    
    start = time.time()
    response = process(request)
    duration = time.time() - start
    
    request_duration.record(duration, {"method": request.method})
    return response

Logs (Correlation)

from opentelemetry import trace
import logging

class OTelLogHandler(logging.Handler):
    def emit(self, record):
        span = trace.get_current_span()
        if span.is_recording():
            ctx = span.get_span_context()
            record.trace_id = format(ctx.trace_id, '032x')
            record.span_id = format(ctx.span_id, '016x')

# Add to logger
handler = OTelLogHandler()
logging.getLogger().addHandler(handler)

# Format logs with trace context
formatter = logging.Formatter(
    '%(asctime)s [%(trace_id)s:%(span_id)s] %(message)s'
)

The Collector

Docker Compose

version: '3'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP

Collector Config

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1000
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 500

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true
  
  prometheus:
    endpoint: "0.0.0.0:8889"
  
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [jaeger]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Kubernetes Integration

Auto-instrumentation Operator

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
# Pod annotation
annotations:
  instrumentation.opentelemetry.io/inject-python: "true"

Auto-instrumentedauto-instrumented without code changes.

Best Practices

1. Resource Attributes

resource = Resource.create({
    "service.name": "payment-service",
    "service.version": "1.2.3",
    "deployment.environment": "production",
    "service.instance.id": socket.gethostname(),
})

2. Semantic Conventions

# Use standard attribute names
span.set_attribute("http.method", "POST")
span.set_attribute("http.url", url)
span.set_attribute("http.status_code", 200)
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.statement", query)

3. Sampling

from opentelemetry.sdk.trace.sampling import TraceIdRatioBased

sampler = TraceIdRatioBased(0.1)  # Sample 10% of traces
provider = TracerProvider(sampler=sampler, resource=resource)

4. Error Handling

try:
    result = operation()
except Exception as e:
    span.set_status(Status(StatusCode.ERROR, str(e)))
    span.record_exception(e)
    raise

Vendor Export

Send to any backend:

exporters:
  jaeger: ...
  zipkin: ...
  datadog: ...
  newrelic: ...
  honeycomb: ...
  lightstep: ...

One instrumentation, many destinations.

Final Thoughts

OpenTelemetry ends vendor lock-in for observability. Instrument once with standard APIs, export anywhere.

The ecosystem is mature. It’s the right choice for new projects and worth migrating legacy instrumentation.


Observe everything, lock into nothing.

All posts