The Ethics of AI: Bias and Fairness

December 11, 2018

ai ethics machine-learning

AI is making consequential decisions. Who gets a loan. Who gets an interview. Who gets parole. And increasingly, these systems are making biased decisions.

This isn’t a future problem. It’s happening now.

Real-World Examples

Amazon’s Hiring Algorithm

Amazon built a system to score resumes. It learned from 10 years of hiring data. The problem? Historical hiring was male-dominated in tech roles.

The model penalized resumes with “women’s” (e.g., “women’s chess club captain”). It demoted graduates of all-women’s colleges.

Amazon scrapped the system.

COMPAS Recidivism Algorithm

Courts use COMPAS to predict recidivism (likelihood of re-offending). ProPublica analysis found it was biased against Black defendants:

Black defendants were falsely labeled high-risk at nearly twice the rate of white defendants
White defendants were falsely labeled low-risk at higher rates

Same prediction error, different distribution of harm.

Google Photos

In 2015, Google Photos auto-tagged photos of Black people as “gorillas.” The algorithm learned from training data with insufficient diversity.

How Bias Enters AI Systems

Data Collection Bias

If your training data is biased, your model will be biased:

Historical data reflects historical discrimination
Underrepresented groups have less data
Collection methods can exclude populations

Labeling Bias

Human annotators bring their biases:

Different cultural interpretations
Inconsistent labeling across demographics
Subjective categories

Feature Selection Bias

Choosing what to measure matters:

Using zip codes as proxies for race
“Years of experience” favoring traditionally employed groups
Seemingly neutral features with disparate impact

Algorithmic Bias

Even with perfect data, algorithms can:

Optimize for majority groups
Amplify small biases
Create new discriminatory patterns

Defining Fairness

There’s no single definition. Common framings:

Demographic Parity

Positive outcomes should be equal across groups.

P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1)

Problem: Ignores actual differences in base rates.

Equal Opportunity

True positive rates should be equal across groups.

P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1)

Fair for qualified individuals.

Calibration

Predictions should mean the same thing across groups.

P(Y=1 | Ŷ=p, A=0) = P(Y=1 | Ŷ=p, A=1) = p

A 70% confidence should mean 70% for everyone.

The Impossibility Result

You can’t satisfy all fairness definitions simultaneously (unless the groups are identical or predictions are perfect).

Trade-offs are unavoidable.

Technical Mitigations

Pre-processing

Fix the data before training:

Re-sample to balance demographics
Transform features to remove bias
Augment underrepresented groups

In-processing

Add fairness constraints during training:

# Penalize demographic disparity
loss = classification_loss + lambda * fairness_penalty

Post-processing

Adjust predictions after training:

Calibrate thresholds per group
Reject option classification
Equal odds adjustment

Available Tools

Fairlearn (Microsoft): Fairness assessment and mitigation
AI Fairness 360 (IBM): Comprehensive bias detection
What-If Tool (Google): Visualization for model fairness
Aequitas: Bias and fairness audit toolkit

Beyond Technical Fixes

Technical solutions aren’t enough:

Problem Selection

Should we automate this decision at all? Some domains may be inappropriate for algorithmic decision-making.

Stakeholder Involvement

Affected communities should shape system design. Technical teams alone can’t define fairness.

Transparency

If an algorithm affects someone’s life, they deserve to understand how.

Accountability

Who’s responsible when an AI system causes harm? Clear ownership matters.

Ongoing Monitoring

Bias can emerge over time as populations and contexts change. Continuous auditing is essential.

Practical Steps

For ML practitioners:

Audit your data: Check demographic distributions
Test across groups: Disaggregate performance metrics
Document assumptions: What does fairness mean for this use case?
Involve diverse perspectives: Technical skills aren’t enough
Consider not building: Some applications shouldn’t exist

The Bigger Picture

AI fairness isn’t just a technical challenge. It reflects society’s biases encoded in data and amplified by algorithms.

We’re not just building systems. We’re encoding values, setting defaults, and shaping futures.

The responsibility is significant. Take it seriously.

What we build reflects who we are.