Linux Kernel Tuning for High Performance
linux devops performance
Linux defaults are conservative. They work for most workloads. But high-performance servers—handling thousands of connections, high throughput—need tuning.
Understanding the Parameters
sysctl
Runtime kernel parameters:
# View current value
sysctl net.core.somaxconn
# Set temporarily
sysctl -w net.core.somaxconn=65535
# Set permanently in /etc/sysctl.conf
net.core.somaxconn=65535
# Apply
sysctl -p
ulimits
Per-process resource limits:
# View current limits
ulimit -a
# Set in shell
ulimit -n 65535 # Open files
# Set permanently in /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
Network Tuning
Connection Queue
# Maximum connection backlog
net.core.somaxconn = 65535
# SYN queue size (pending connections)
net.ipv4.tcp_max_syn_backlog = 65535
# Accept backlog for each socket
net.core.netdev_max_backlog = 65535
Memory for Connections
# TCP read/write buffer sizes (min, default, max in bytes)
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Total memory for TCP
net.ipv4.tcp_mem = 786432 1048576 1572864
# Core network buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 262144
net.core.wmem_default = 262144
Connection Reuse
# TIME_WAIT reuse
net.ipv4.tcp_tw_reuse = 1
# Faster TIME_WAIT timeout (controversial)
net.ipv4.tcp_fin_timeout = 30
# Local port range
net.ipv4.ip_local_port_range = 1024 65535
Keep-Alive
# When to start keepalive probes (seconds)
net.ipv4.tcp_keepalive_time = 600
# Interval between probes
net.ipv4.tcp_keepalive_intvl = 60
# Number of probes before dropping
net.ipv4.tcp_keepalive_probes = 3
File Descriptors
Each connection uses a file descriptor. Default limits are too low.
# System-wide maximum
fs.file-max = 2097152
# Per-process (in limits.conf)
* soft nofile 1048576
* hard nofile 1048576
# For systemd services
LimitNOFILE=1048576
Check usage:
# System-wide
cat /proc/sys/fs/file-nr
# Output: allocated free-but-allocated maximum
# Per-process
ls /proc/$(pgrep nginx)/fd | wc -l
Memory Tuning
Swappiness
# How aggressively to swap (0-100)
vm.swappiness = 10 # Prefer RAM, minimize swapping
For database servers, even lower or 0.
Dirty Pages
# Start writing dirty pages at 10% of RAM
vm.dirty_ratio = 10
# Background writing at 5%
vm.dirty_background_ratio = 5
# Max age of dirty pages (centiseconds)
vm.dirty_expire_centisecs = 1500
Overcommit
# 0: heuristic overcommit
# 1: always allow
# 2: strict (check available)
vm.overcommit_memory = 0
# Ratio for mode 2
vm.overcommit_ratio = 80
Process Limits
Maximum User Processes
# /etc/security/limits.conf
* soft nproc 65535
* hard nproc 65535
Maximum Threads
# Kernel limit
kernel.pid_max = 4194304
kernel.threads-max = 4194304
Application-Specific
For Nginx
# nginx.conf
worker_rlimit_nofile 65535;
events {
worker_connections 16384;
use epoll;
multi_accept on;
}
For Redis
# /etc/sysctl.conf
vm.overcommit_memory = 1
net.core.somaxconn = 65535
For PostgreSQL
# Shared memory
kernel.shmmax = 68719476736 # 64GB
kernel.shmall = 16777216
# Semaphores
kernel.sem = 1000 32000 32 1000
# HugePages (if using)
vm.nr_hugepages = 1000
Practical Example
High-traffic web server:
# /etc/sysctl.conf
# Network
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
# TCP buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# File descriptors
fs.file-max = 2097152
# Memory
vm.swappiness = 10
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
# /etc/security/limits.conf
* soft nofile 1048576
* hard nofile 1048576
* soft nproc 65535
* hard nproc 65535
Verifying Changes
# Check sysctl
sysctl -a | grep somaxconn
# Check limits for running process
cat /proc/$(pgrep -f "your-app")/limits
# Check network buffers
ss -m
# Monitor file descriptors
lsof -p $(pgrep nginx) | wc -l
Monitoring
Watch for symptoms:
# Connection tracking overflow
dmesg | grep "nf_conntrack: table full"
# SYN floods
netstat -s | grep -i syn
# Drop statistics
netstat -s | grep -i drop
Common Mistakes
- Forgetting to reload:
sysctl -por restart services - Setting without measuring: Tune based on metrics, not guesses
- Over-tuning: Default values work for most workloads
- Ignoring the application: Kernel tuning can’t fix app inefficiency
Final Thoughts
Start with conservative changes. Measure impact. Tune incrementally.
Most applications never need kernel tuning. But when you hit limits—connection errors, latency spikes, throughput caps—these parameters matter.
Measure twice, tune once.