detoxify - Reliable Toxic Comment Classification with Pytorch Lightning and Transformers

Introduction to Detoxify

Detoxify is an innovative project that focuses on classifying toxic comments using modern technologies such as Pytorch Lightning and Transformers. The project offers easy-to-use trained models and code aimed at predicting toxic comments across multiple challenges presented in the Jigsaw competition series. Developed by Laura Hanu at Unitary, this project seeks to combat harmful online content efficiently by interpreting such content within its context.

News & Updates

22-10-2021: Enhanced Multilingual Model

Detoxify has introduced updated multilingual model weights trained using translated data from two Jigsaw challenges to minimize biases. This has resulted in a consistent improvement in model performance with a best AUC score of 92.11 on the test set.

03-09-2021: Unbiased Model Improvement

The unbiased model's performance was improved with new training data from previous Jigsaw challenges, achieving a superior test set score.

15-02-2021: Recognition by Scientific American

The project gained recognition with an opinion piece titled "Can AI identify toxic online content?" featured in Scientific American.

14-01-2021: Introduction of Lightweight Models

Smaller models trained with Albert were added, improving accessibility without significantly compromising performance.

Project Description

The core of Detoxify revolves around trained models for predicting toxic comments in three main areas: toxic comment classification, unintended bias in toxic comments, and multilingual toxic comment classification. The models take advantage of powerful technologies, namely 🤗 Transformers and ⚡ Pytorch Lightning, and data sourced from Kaggle's Jigsaw competitions.

Challenges & Models

Toxic Comment Classification (2018): Aims to build a model capable of detecting various toxicity types, like threats and obscenities, using Wikipedia Comments. Model name: original.
Jigsaw Unintended Bias in Toxicity Classification (2019): Focuses on recognizing and minimizing unintended bias concerning identity mentions. Data is sourced from Civil Comments. Model name: unbiased.
Jigsaw Multilingual Toxic Comment Classification (2020): Seeks to develop effective multilingual models using data from multiple sources including Wikipedia Comments and Civil Comments. Model name: multilingual.

Multilingual Model Language Performance

The multilingual model provides support for multiple languages with impressive performance metrics across language subgroups, including scores upwards of 89% AUC in languages such as Italian, French, and Russian.

Ethical Considerations and Limitations

Despite its capabilities, Detoxify faces challenges like potentially misclassifying comments due to language nuance, especially with swearing or humor which might be context-dependent. The intended application of this library is for research and to assist content moderators by quickly flagging harmful content.

Quick Start Guide

For quick predictions, Detoxify provides a straightforward usage method:

Installation:
```
pip install detoxify
```

Example Prediction Code:

from detoxify import Detoxify

results = Detoxify('original').predict('example text')

Language Support: Currently supports English, French, Spanish, Italian, Portuguese, Turkish, and Russian.

Labels

Detoxify categorizes comments using labels such as "Very Toxic", "Toxic", "Hard to Say", and "Not Toxic", with specific labels for each challenge including "toxic", "obscene", "identity_attack", and more.

How to Run

To run Detoxify:

Clone the project repository.
Set up a virtual environment and install the necessary dependencies.
Train models or run predictions using the provided scripts and model checkpoints.

Training and Evaluation

The training module relies on data downloaded from Kaggle competitions, with comprehensive guidelines provided to set up and start training. Evaluating model performance is also facilitated through specific scripts that calculate various metrics like AUC scores and bias metrics in toxicity detection.

Conclusion

Detoxify is a robust and user-friendly tool for detecting toxic comments, leveraging cutting-edge machine learning methodologies. With its continuous updates and focus on ethical considerations, Detoxify stands as a significant resource in the landscape of content moderation and online community management. For further exploration and utilization of Detoxify, users are encouraged to dive into the detailed documentation provided.