Evaluation & Safety

AI Bias

Systematic unfairness in AI outputs caused by skewed training data, flawed labelling, or model design choices that reflect and amplify existing societal inequities.

How Bias Enters AI Systems

Bias doesn't appear from nowhere — it has concrete entry points. Training data bias is the most obvious: if your dataset over-represents certain demographics or viewpoints, the model learns those skews as ground truth. Measurement and labelling bias creeps in when the humans annotating data bring their own assumptions — what counts as "toxic" or "professional" varies across cultures. Model optimisation bias emerges when algorithms optimise for aggregate accuracy, performing well on majority groups while failing minorities. Feedback loops compound everything: a biased model's outputs influence future training data, reinforcing the original skew over successive iterations.

Real-World Impacts

These aren't theoretical concerns. Amazon scrapped an internal hiring tool after discovering it systematically downranked resumes containing the word "women's" — the model had learned from a decade of male-dominated hiring patterns. Lending algorithms have been shown to charge higher interest rates to minority applicants with identical credit profiles. Healthcare allocation systems have deprioritised Black patients by using cost as a proxy for need, ignoring that systemic barriers meant lower historical spending didn't reflect lower medical need. Each of these cases involved well-intentioned teams who didn't anticipate how historical inequities would propagate through their models.

Mitigation Strategies

Addressing bias requires intervention at every stage of the ML lifecycle. Diverse, representative datasets are necessary but not sufficient — you also need diverse teams evaluating what "representative" means. Adversarial debiasing techniques train models to be unable to predict protected attributes from their internal representations. Regular audits using disaggregated metrics — breaking performance down by demographic group rather than reporting a single accuracy number — surface disparities that aggregate stats hide. Governance frameworks like impact assessments and bias bounty programmes create institutional accountability beyond individual good intentions. The EU AI Act and ISO 42001 are pushing these from best practices to legal requirements for high-risk applications.