FinOps: Managing Cloud Costs
Cloud costs are out of control. What started as “pay for what you use” has become “pay for what you forgot to shut down.” FinOps brings financial accountability to cloud operations. Here’s how.
The Problem
Month 1: $5,000 (development phase)
Month 6: $25,000 (growing usage)
Month 12: $75,000 (what happened?!)
Cloud spending grows fast because:
- No upfront costs = no friction
- Easy to provision = easy to forget
- Variable pricing = hard to predict
What is FinOps?
FinOps (Financial Operations) is a cultural practice that combines:
- Finance (cost management)
- Technology (engineering)
- Business (value optimization)
Traditional: "Here's your cloud bill"
FinOps: "Here's what you spent, why, and how to optimize"
The FinOps Framework
Inform
Visibility into spending:
Who is spending?
What are they spending on?
When does spending spike?
Why is this resource needed?
Optimize
Reduce waste and improve efficiency:
Right-size instances
Eliminate unused resources
Leverage discounts
Architecture improvements
Operate
Continuous improvement:
Set budgets and alerts
Charge back to teams
Automate policies
Regular reviews
Cost Visibility
Tagging Strategy
Essential tags for every resource:
# AWS resource tags
Tags:
Environment: production
Team: platform
Owner: alice@company.com
Project: user-service
CostCenter: engineering
Enforce tagging:
# Terraform example
resource "aws_instance" "web" {
# ... configuration ...
tags = {
Environment = var.environment
Team = var.team
Owner = var.owner
Project = var.project
CostCenter = var.cost_center
}
lifecycle {
# Prevent creation without tags
precondition {
condition = length(var.team) > 0
error_message = "Team tag is required."
}
}
}
Cost Allocation
Break down spending by:
| Dimension | Questions Answered |
|---|---|
| Service | What AWS services cost most? |
| Team | Which team spends most? |
| Environment | Prod vs. dev vs. staging? |
| Application | Which app is expensive? |
Dashboards
Real-time cost visibility:
┌─────────────────────────────────────────┐
│ Monthly Cloud Spend │
│ ████████████████████░░░ $68,420 / $80K │
├─────────────────────────────────────────┤
│ By Team: │
│ Platform ████████ $35,000 │
│ Data ████ $18,000 │
│ Frontend ███ $10,000 │
│ Other ██ $5,420 │
└─────────────────────────────────────────┘
Common Waste Patterns
Unused Resources
# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query "Volumes[*].{ID:VolumeId,Size:Size}"
# Find idle load balancers
aws elbv2 describe-load-balancers \
--query "LoadBalancers[?length(AvailabilityZones) == \`0\`]"
Oversized Instances
Analysis:
Instance: m5.4xlarge (16 vCPU, 64 GB)
CPU Avg: 8%
Memory Avg: 15%
Recommendation:
Downsize to: m5.xlarge (4 vCPU, 16 GB)
Monthly savings: $280
Dev/Test Running 24/7
# Lambda to stop dev instances at night
import boto3
def stop_dev_instances(event, context):
ec2 = boto3.client('ec2')
# Find instances tagged as dev
instances = ec2.describe_instances(
Filters=[
{'Name': 'tag:Environment', 'Values': ['development']},
{'Name': 'instance-state-name', 'Values': ['running']}
]
)
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
ec2.stop_instances(InstanceIds=[instance['InstanceId']])
Schedule: Stop at 7 PM, start at 7 AM = 50% savings.
Optimization Strategies
Reserved Instances / Savings Plans
| Purchase Option | Discount | Commitment |
|---|---|---|
| On-Demand | 0% | None |
| 1-Year Reserved | 30-40% | 1 year |
| 3-Year Reserved | 50-60% | 3 years |
| Spot Instances | 60-90% | Can be interrupted |
Strategy:
├── Baseline load: Reserved Instances
├── Variable load: On-Demand
└── Fault-tolerant: Spot Instances
Right-Sizing
# Analyze CloudWatch metrics
def analyze_instance_utilization(instance_id):
cloudwatch = boto3.client('cloudwatch')
# Get CPU utilization
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.now() - timedelta(days=14),
EndTime=datetime.now(),
Period=3600,
Statistics=['Average', 'Maximum']
)
avg_cpu = statistics.mean([d['Average'] for d in response['Datapoints']])
if avg_cpu < 20:
return "Consider downsizing"
elif avg_cpu > 80:
return "Consider upsizing"
return "Size appropriate"
Storage Optimization
Hot data: SSD (gp3) $0.08/GB
Warm data: HDD (st1) $0.045/GB
Cold data: S3 Infrequent $0.0125/GB
Archive: S3 Glacier $0.004/GB
Implement lifecycle policies:
# S3 lifecycle policy
Rules:
- ID: "ArchiveOldData"
Status: Enabled
Transitions:
- Days: 30
StorageClass: STANDARD_IA
- Days: 90
StorageClass: GLACIER
Expiration:
Days: 365
Alerting and Budgets
AWS Budgets
# Terraform AWS Budget
resource "aws_budgets_budget" "monthly" {
name = "monthly-cloud-budget"
budget_type = "COST"
limit_amount = "80000"
limit_unit = "USD"
time_unit = "MONTHLY"
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["cloud-costs@company.com"]
}
}
Anomaly Detection
Alert on unexpected spending:
Normal daily spend: $2,500
Actual today: $12,000
Alert: 380% above baseline
FinOps Culture
Team Accountability
## Monthly Cost Review
- Each team presents their cloud spend
- Explain increases/decreases
- Share optimization wins
- Plan next month's initiatives
Showback vs. Chargeback
| Approach | Description | Incentive |
|---|---|---|
| Showback | Show costs, don’t bill | Awareness |
| Chargeback | Bill teams directly | Strong incentive |
Engineering Practices
# Include cost in PR reviews
"""
Cost Impact Assessment:
- Current: 2x m5.large ($140/month)
- Proposed: 1x m5.xlarge ($140/month)
- Net change: $0 (but simplified operations)
"""
Tools
| Tool | Use Case |
|---|---|
| AWS Cost Explorer | Native AWS analysis |
| CloudHealth | Multi-cloud FinOps |
| Kubecost | Kubernetes cost |
| Infracost | Terraform cost estimation |
| Spot.io | Spot instance management |
Final Thoughts
Cloud costs are everyone’s responsibility. FinOps creates visibility, accountability, and optimization practices.
Start with:
- Tag everything
- Set budgets and alerts
- Review monthly
- Optimize continuously
Your CFO will thank you.
The cloud is pay-as-you-go. FinOps is pay-as-you-know.