Failed Machine Learning (FML)
Overview
Failed Machine Learning (FML) is a project that curates a comprehensive list of high-profile examples where machine learning (ML) applications have not met their intended expectations. While machine learning holds vast potential to revolutionize industries, these case studies offer valuable insights into the common pitfalls that can occur during implementation. The intent is to learn from these failures to improve future ML applications.
Categories of Failure
FML categorizes these examples into various domains, each facing its unique challenges:
Classic Machine Learning
- Amazon AI Recruitment System: An automated hiring tool that exhibited gender bias against female candidates.
- Genderify: A gender identification tool that failed due to inherent biases and inaccuracies.
- Scientific Paper Reviews: Princeton University found significant errors in scientific papers using ML, highlighting a reproducibility crisis.
- COVID-19 Models: Predictive models for COVID-19 diagnosis were inaccurate and unfit for clinical use.
- Bias in Algorithms: Examples include COMPAS, which showed racial bias in recidivism predictions, and algorithms in child welfare that disproportionately targeted Black families.
- Racial Bias in Healthcare: An algorithm predicting healthcare needs exhibited significant racial bias, underestimating the severity of illness in Black patients.
- Apple Card's Gender Discrimination: Investigated for offering biased credit lines based on gender.
Computer Vision
- Football Camera Confusion: AI mistook a bald head for a football during live streaming.
- Amazon Rekognition: Facial recognition errors, including inaccurate and biased identification.
- Traffic Camera Error: Mistaking a poster for a jaywalker in a facial recognition system.
Forecasting
- Google Flu Trends: Inaccurate predictions of flu prevalence using search data.
- Zillow's Pricing Algorithms: Resulted in financial losses due to incorrect property valuations.
- AI Hedge Funds: Automated trading systems that led to financial failures.
Image Generation
- Bias in AI Image Generation: AI tools altered characteristics in images, often reflecting racial and gender biases.
Natural Language Processing
- Microsoft Tay: A chatbot that quickly started posting offensive content.
- Medical Advice Failures: Including chatbots providing harmful advice or unable to distinguish fact from fiction.
Recommendation Systems
- IBM Watson Health: Offered incorrect treatment recommendations for cancer, highlighting safety concerns.
- Netflix Challenge: The winning algorithm of Netflix's challenge wasn’t feasible enough for practical usage despite its improved recommendations.
Lessons Learned
FML serves as a crucial reminder of the complexities and ethical considerations inherent in building and deploying machine learning systems. Common themes include algorithmic bias, data privacy issues, transparency lapses, and the significant impact of seemingly minor errors once systems are deployed in real-world settings.
Conclusion
In understanding these examples of failure, industry practitioners and researchers can address potential challenges more effectively, fostering the development of robust, fair, and efficient machine learning applications. Through detailed examination of these case studies, FML not only acknowledges the discrepancies but also champions the courage to learn and improve continuously, as emphasized by Winston Churchill’s words on success and failure.