Introduction to Detoxify
Detoxify is an innovative project that focuses on classifying toxic comments using modern technologies such as Pytorch Lightning and Transformers. The project offers easy-to-use trained models and code aimed at predicting toxic comments across multiple challenges presented in the Jigsaw competition series. Developed by Laura Hanu at Unitary, this project seeks to combat harmful online content efficiently by interpreting such content within its context.
News & Updates
22-10-2021: Enhanced Multilingual Model
Detoxify has introduced updated multilingual model weights trained using translated data from two Jigsaw challenges to minimize biases. This has resulted in a consistent improvement in model performance with a best AUC score of 92.11 on the test set.
03-09-2021: Unbiased Model Improvement
The unbiased model's performance was improved with new training data from previous Jigsaw challenges, achieving a superior test set score.
15-02-2021: Recognition by Scientific American
The project gained recognition with an opinion piece titled "Can AI identify toxic online content?" featured in Scientific American.
14-01-2021: Introduction of Lightweight Models
Smaller models trained with Albert were added, improving accessibility without significantly compromising performance.
Project Description
The core of Detoxify revolves around trained models for predicting toxic comments in three main areas: toxic comment classification, unintended bias in toxic comments, and multilingual toxic comment classification. The models take advantage of powerful technologies, namely 🤗 Transformers and ⚡ Pytorch Lightning, and data sourced from Kaggle's Jigsaw competitions.
Challenges & Models
-
Toxic Comment Classification (2018): Aims to build a model capable of detecting various toxicity types, like threats and obscenities, using Wikipedia Comments. Model name:
original
. -
Jigsaw Unintended Bias in Toxicity Classification (2019): Focuses on recognizing and minimizing unintended bias concerning identity mentions. Data is sourced from Civil Comments. Model name:
unbiased
. -
Jigsaw Multilingual Toxic Comment Classification (2020): Seeks to develop effective multilingual models using data from multiple sources including Wikipedia Comments and Civil Comments. Model name:
multilingual
.
Multilingual Model Language Performance
The multilingual model provides support for multiple languages with impressive performance metrics across language subgroups, including scores upwards of 89% AUC in languages such as Italian, French, and Russian.
Ethical Considerations and Limitations
Despite its capabilities, Detoxify faces challenges like potentially misclassifying comments due to language nuance, especially with swearing or humor which might be context-dependent. The intended application of this library is for research and to assist content moderators by quickly flagging harmful content.
Quick Start Guide
For quick predictions, Detoxify provides a straightforward usage method:
-
Installation:
pip install detoxify
-
Example Prediction Code:
from detoxify import Detoxify results = Detoxify('original').predict('example text')
-
Language Support: Currently supports English, French, Spanish, Italian, Portuguese, Turkish, and Russian.
Labels
Detoxify categorizes comments using labels such as "Very Toxic", "Toxic", "Hard to Say", and "Not Toxic", with specific labels for each challenge including "toxic", "obscene", "identity_attack", and more.
How to Run
To run Detoxify:
- Clone the project repository.
- Set up a virtual environment and install the necessary dependencies.
- Train models or run predictions using the provided scripts and model checkpoints.
Training and Evaluation
The training module relies on data downloaded from Kaggle competitions, with comprehensive guidelines provided to set up and start training. Evaluating model performance is also facilitated through specific scripts that calculate various metrics like AUC scores and bias metrics in toxicity detection.
Conclusion
Detoxify is a robust and user-friendly tool for detecting toxic comments, leveraging cutting-edge machine learning methodologies. With its continuous updates and focus on ethical considerations, Detoxify stands as a significant resource in the landscape of content moderation and online community management. For further exploration and utilization of Detoxify, users are encouraged to dive into the detailed documentation provided.