OpenTelemetry: The Standard for Observability
devops observability
OpenTelemetry has become the standard for observability. Vendor-neutral instrumentation that works everywhere. Here’s how to use it effectively.
What Is OpenTelemetry
OpenTelemetry (OTel) provides:
- APIs: For instrumenting code
- SDKs: For collecting telemetry
- Collector: For processing and exporting
- Protocol (OTLP): Standard wire format
The Three Pillars
Traces: Request flow across services
Metrics: Aggregated measurements over time
Logs: Discrete events with context
OTel unifies all three.
Python Setup
Installation
pip install opentelemetry-api \
opentelemetry-sdk \
opentelemetry-instrumentation-django \
opentelemetry-instrumentation-requests \
opentelemetry-exporter-otlp
Basic Configuration
# otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
# Create resource
resource = Resource.create({
"service.name": "my-service",
"service.version": "1.0.0",
"deployment.environment": "production"
})
# Create tracer provider
provider = TracerProvider(resource=resource)
# Add exporter
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
# Set global tracer provider
trace.set_tracer_provider(provider)
Django Auto-instrumentation
# manage.py
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')
from opentelemetry.instrumentation.django import DjangoInstrumentor
DjangoInstrumentor().instrument()
# Continue with Django setup
Manual Instrumentation
Creating Spans
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def process_order(order_id):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
# Child span
with tracer.start_as_current_span("validate_order"):
validate(order_id)
with tracer.start_as_current_span("charge_payment"):
charge(order_id)
span.set_attribute("order.status", "completed")
Context Propagation
from opentelemetry import trace
from opentelemetry.propagate import inject, extract
# Inject context into outgoing request
headers = {}
inject(headers)
requests.get("http://other-service/api", headers=headers)
# Extract context from incoming request
context = extract(request.headers)
with tracer.start_as_current_span("handle_request", context=context):
# Process with parent context
pass
Metrics
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
# Setup
exporter = OTLPMetricExporter(endpoint="http://localhost:4317")
reader = PeriodicExportingMetricReader(exporter)
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)
# Create meter
meter = metrics.get_meter(__name__)
# Counter
request_counter = meter.create_counter(
"http_requests_total",
description="Total HTTP requests"
)
# Histogram
request_duration = meter.create_histogram(
"http_request_duration_seconds",
description="HTTP request duration"
)
# Use
def handle_request(request):
request_counter.add(1, {"method": request.method, "path": request.path})
start = time.time()
response = process(request)
duration = time.time() - start
request_duration.record(duration, {"method": request.method})
return response
Logs (Correlation)
from opentelemetry import trace
import logging
class OTelLogHandler(logging.Handler):
def emit(self, record):
span = trace.get_current_span()
if span.is_recording():
ctx = span.get_span_context()
record.trace_id = format(ctx.trace_id, '032x')
record.span_id = format(ctx.span_id, '016x')
# Add to logger
handler = OTelLogHandler()
logging.getLogger().addHandler(handler)
# Format logs with trace context
formatter = logging.Formatter(
'%(asctime)s [%(trace_id)s:%(span_id)s] %(message)s'
)
The Collector
Docker Compose
version: '3'
services:
otel-collector:
image: otel/opentelemetry-collector-contrib
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
Collector Config
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1000
memory_limiter:
check_interval: 1s
limit_mib: 500
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
Kubernetes Integration
Auto-instrumentation Operator
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: my-instrumentation
spec:
exporter:
endpoint: http://otel-collector:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "0.25"
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
# Pod annotation
annotations:
instrumentation.opentelemetry.io/inject-python: "true"
Auto-instrumentedauto-instrumented without code changes.
Best Practices
1. Resource Attributes
resource = Resource.create({
"service.name": "payment-service",
"service.version": "1.2.3",
"deployment.environment": "production",
"service.instance.id": socket.gethostname(),
})
2. Semantic Conventions
# Use standard attribute names
span.set_attribute("http.method", "POST")
span.set_attribute("http.url", url)
span.set_attribute("http.status_code", 200)
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.statement", query)
3. Sampling
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1) # Sample 10% of traces
provider = TracerProvider(sampler=sampler, resource=resource)
4. Error Handling
try:
result = operation()
except Exception as e:
span.set_status(Status(StatusCode.ERROR, str(e)))
span.record_exception(e)
raise
Vendor Export
Send to any backend:
exporters:
jaeger: ...
zipkin: ...
datadog: ...
newrelic: ...
honeycomb: ...
lightstep: ...
One instrumentation, many destinations.
Final Thoughts
OpenTelemetry ends vendor lock-in for observability. Instrument once with standard APIs, export anywhere.
The ecosystem is mature. It’s the right choice for new projects and worth migrating legacy instrumentation.
Observe everything, lock into nothing.