Technology

Bias and Fairness in Machine Learning

Abhay Abhay 4 min read
Bias and Fairness in Machine Learning
Photo by Tingey Injury Law Firm on Unsplash

A machine learning model has no opinions. It has gradients. So when a hiring model quietly downgrades every resume that mentions “women’s chess club,” it isn’t being malicious — it’s being an excellent student of a flawed textbook. That, in one sentence, is the whole problem with bias in ML: the model is doing exactly what you asked, learning the patterns in your data with ruthless fidelity, including the patterns you’d be embarrassed to say out loud.

Let’s talk about where that goes wrong, why “just make it fair” is mathematically impossible, and what you can actually do about it.

Where bias sneaks in

Bias rarely arrives through a villain twirling a moustache. It seeps in through four ordinary doors.

Data. If your training data reflects a skewed world, the model learns the skew. Amazon famously scrapped an experimental resume-screening tool around 2015 after it learned to penalise applications associated with women — because it had been trained on a decade of mostly male hires and concluded, reasonably, that maleness correlated with getting hired.

Labels. The “ground truth” you train on is often a human judgement in disguise. Label “was this loan a good decision?” using historical approvals, and you’ve baked in every historical loan officer’s instincts, good and bad.

Proxies. Drop the sensitive attribute and you’re safe, right? Adorable. ZIP code, first name, the device you applied from, even “distance from the office” can correlate tightly with race or income. Remove race from the features and the model cheerfully reconstructs it from the proxies you left behind.

Feedback loops. This is the nasty one. A predictive-policing model sends more patrols to neighbourhoods it flagged; more patrols find more incidents; the new data confirms the model; rinse, repeat. The model isn’t predicting crime anymore — it’s predicting where it sent the police last week.

Fairness, and why you can’t have all of it

Here’s the part nobody tells you in the intro tutorial: there are multiple, reasonable, mutually incompatible definitions of “fair.”

  • Demographic parity: the model approves people at the same rate across groups. Independence between the prediction and the protected attribute.
  • Equalized odds: the model has the same true-positive and false-positive rates across groups. Equal error rates, given the actual outcome.
  • Predictive parity: a given score means the same thing regardless of group — equal precision.

These sound like three flavours of the same ice cream. They are not. The impossibility theorem proves that when base rates differ between groups, you cannot satisfy all three at once — except in degenerate cases like a perfect classifier or identical base rates.

This isn’t abstract. The COMPAS recidivism tool is the canonical war story. ProPublica found Black defendants who didn’t reoffend were flagged high-risk at nearly twice the rate of white defendants (44.9% vs 23.5%) — a violation of equalized odds. Northpointe shot back that COMPAS satisfied predictive parity: a given risk score meant the same probability of reoffending regardless of race. Both were correct. They were optimising different definitions, and the math guaranteed they couldn’t both win. The “is it fair?” debate was really an undeclared fight over which fairness.

The takeaway: pick your fairness metric on purpose, before you train, based on the harm you most want to avoid. A false positive in criminal justice is not the same kind of wrong as a false negative in cancer screening.

Mitigation: three places to intervene

Once you’ve chosen a target, you can attack bias at three stages of the pipeline.

  • Pre-processing — fix the data. Reweighting samples, learning fair representations, or removing disparate impact before training.
  • In-processing — fix the model. Add a fairness penalty to the loss, or use adversarial debiasing where a second network tries to predict the protected attribute from your predictions and you train to defeat it.
  • Post-processing — fix the outputs. Adjust decision thresholds per group to equalise odds after the fact.

The two heavyweight open-source toolkits make this concrete. Microsoft’s Fairlearn is sklearn-friendly and great for assessment plus threshold-based mitigation. IBM’s AIF360 ships a larger zoo of algorithms across all three stages. A quick Fairlearn audit looks like this:

from fairlearn.metrics import MetricFrame, selection_rate
from sklearn.metrics import accuracy_score

# y_true, y_pred from your trained model; sex is the sensitive feature
metrics = MetricFrame(
    metrics={"accuracy": accuracy_score, "selection_rate": selection_rate},
    y_true=y_true,
    y_pred=y_pred,
    sensitive_features=sex,
)

print(metrics.by_group)              # per-group numbers
print(metrics.difference())          # max gap between groups

If selection_rate diverges sharply between groups, you have a demographic-parity problem staring back at you — in numbers, not vibes.

A practical checklist

Don’t be preachy about fairness; be operational. Next time you ship a model that touches people:

  1. Name the protected groups and the harm you’re guarding against.
  2. Choose one fairness metric that matches that harm — and write down what you’re trading away.
  3. Hunt for proxies, not just the obvious sensitive columns.
  4. Audit with real numbers (MetricFrame, AIF360’s metrics) on a held-out set, sliced by group.
  5. Mitigate at the right stage, then re-audit — mitigation can move the harm rather than remove it.
  6. Watch for feedback loops in production; a model that shapes its own future data needs ongoing monitoring, not a one-time blessing.

Fairness isn’t a checkbox you tick before launch. It’s a decision you make on purpose, defend with metrics, and keep re-checking — because the model will never stop being an honest mirror of the data you feed it.

More posts