FinOps: Managing Cloud Costs

devops cloud

Cloud costs are out of control. What started as “pay for what you use” has become “pay for what you forgot to shut down.” FinOps brings financial accountability to cloud operations. Here’s how.

The Problem

Month 1: $5,000 (development phase)
Month 6: $25,000 (growing usage)
Month 12: $75,000 (what happened?!)

Cloud spending grows fast because:

What is FinOps?

FinOps (Financial Operations) is a cultural practice that combines:

Traditional: "Here's your cloud bill"
FinOps:      "Here's what you spent, why, and how to optimize"

The FinOps Framework

Inform

Visibility into spending:

Who is spending?
What are they spending on?
When does spending spike?
Why is this resource needed?

Optimize

Reduce waste and improve efficiency:

Right-size instances
Eliminate unused resources
Leverage discounts
Architecture improvements

Operate

Continuous improvement:

Set budgets and alerts
Charge back to teams
Automate policies
Regular reviews

Cost Visibility

Tagging Strategy

Essential tags for every resource:

# AWS resource tags
Tags:
  Environment: production
  Team: platform
  Owner: alice@company.com
  Project: user-service
  CostCenter: engineering

Enforce tagging:

# Terraform example
resource "aws_instance" "web" {
  # ... configuration ...
  
  tags = {
    Environment = var.environment
    Team        = var.team
    Owner       = var.owner
    Project     = var.project
    CostCenter  = var.cost_center
  }
  
  lifecycle {
    # Prevent creation without tags
    precondition {
      condition     = length(var.team) > 0
      error_message = "Team tag is required."
    }
  }
}

Cost Allocation

Break down spending by:

DimensionQuestions Answered
ServiceWhat AWS services cost most?
TeamWhich team spends most?
EnvironmentProd vs. dev vs. staging?
ApplicationWhich app is expensive?

Dashboards

Real-time cost visibility:

┌─────────────────────────────────────────┐
│  Monthly Cloud Spend                    │
│  ████████████████████░░░ $68,420 / $80K │
├─────────────────────────────────────────┤
│  By Team:                               │
│  Platform    ████████  $35,000          │
│  Data        ████      $18,000          │
│  Frontend    ███       $10,000          │
│  Other       ██        $5,420           │
└─────────────────────────────────────────┘

Common Waste Patterns

Unused Resources

# Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query "Volumes[*].{ID:VolumeId,Size:Size}"

# Find idle load balancers
aws elbv2 describe-load-balancers \
  --query "LoadBalancers[?length(AvailabilityZones) == \`0\`]"

Oversized Instances

Analysis:
  Instance: m5.4xlarge (16 vCPU, 64 GB)
  CPU Avg: 8%
  Memory Avg: 15%
  
Recommendation:
  Downsize to: m5.xlarge (4 vCPU, 16 GB)
  Monthly savings: $280

Dev/Test Running 24/7

# Lambda to stop dev instances at night
import boto3

def stop_dev_instances(event, context):
    ec2 = boto3.client('ec2')
    
    # Find instances tagged as dev
    instances = ec2.describe_instances(
        Filters=[
            {'Name': 'tag:Environment', 'Values': ['development']},
            {'Name': 'instance-state-name', 'Values': ['running']}
        ]
    )
    
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            ec2.stop_instances(InstanceIds=[instance['InstanceId']])

Schedule: Stop at 7 PM, start at 7 AM = 50% savings.

Optimization Strategies

Reserved Instances / Savings Plans

Purchase OptionDiscountCommitment
On-Demand0%None
1-Year Reserved30-40%1 year
3-Year Reserved50-60%3 years
Spot Instances60-90%Can be interrupted
Strategy:
  ├── Baseline load: Reserved Instances
  ├── Variable load: On-Demand
  └── Fault-tolerant: Spot Instances

Right-Sizing

# Analyze CloudWatch metrics
def analyze_instance_utilization(instance_id):
    cloudwatch = boto3.client('cloudwatch')
    
    # Get CPU utilization
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=datetime.now() - timedelta(days=14),
        EndTime=datetime.now(),
        Period=3600,
        Statistics=['Average', 'Maximum']
    )
    
    avg_cpu = statistics.mean([d['Average'] for d in response['Datapoints']])
    
    if avg_cpu < 20:
        return "Consider downsizing"
    elif avg_cpu > 80:
        return "Consider upsizing"
    return "Size appropriate"

Storage Optimization

Hot data:  SSD (gp3)        $0.08/GB
Warm data: HDD (st1)        $0.045/GB
Cold data: S3 Infrequent    $0.0125/GB
Archive:   S3 Glacier       $0.004/GB

Implement lifecycle policies:

# S3 lifecycle policy
Rules:
  - ID: "ArchiveOldData"
    Status: Enabled
    Transitions:
      - Days: 30
        StorageClass: STANDARD_IA
      - Days: 90
        StorageClass: GLACIER
    Expiration:
      Days: 365

Alerting and Budgets

AWS Budgets

# Terraform AWS Budget
resource "aws_budgets_budget" "monthly" {
  name              = "monthly-cloud-budget"
  budget_type       = "COST"
  limit_amount      = "80000"
  limit_unit        = "USD"
  time_unit         = "MONTHLY"
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = ["cloud-costs@company.com"]
  }
}

Anomaly Detection

Alert on unexpected spending:

Normal daily spend: $2,500
Actual today: $12,000
Alert: 380% above baseline

FinOps Culture

Team Accountability

## Monthly Cost Review
- Each team presents their cloud spend
- Explain increases/decreases
- Share optimization wins
- Plan next month's initiatives

Showback vs. Chargeback

ApproachDescriptionIncentive
ShowbackShow costs, don’t billAwareness
ChargebackBill teams directlyStrong incentive

Engineering Practices

# Include cost in PR reviews
"""
Cost Impact Assessment:
- Current: 2x m5.large ($140/month)
- Proposed: 1x m5.xlarge ($140/month)
- Net change: $0 (but simplified operations)
"""

Tools

ToolUse Case
AWS Cost ExplorerNative AWS analysis
CloudHealthMulti-cloud FinOps
KubecostKubernetes cost
InfracostTerraform cost estimation
Spot.ioSpot instance management

Final Thoughts

Cloud costs are everyone’s responsibility. FinOps creates visibility, accountability, and optimization practices.

Start with:

  1. Tag everything
  2. Set budgets and alerts
  3. Review monthly
  4. Optimize continuously

Your CFO will thank you.


The cloud is pay-as-you-go. FinOps is pay-as-you-know.

All posts