classifier-multi-label

Text Classification Multi-Label: An In-Depth Introduction

Introduction

In the field of text classification, two primary tasks emerge: multi-class classification and multi-label classification. Multi-class classification involves assigning a single category to an input data from multiple possible categories. For instance, determining a person's gender falls under multi-class classification, where the result can be either "male" or "female." Similarly, assessing the sentiment of a text as "positive," "neutral," or "negative" is another example.

In contrast, multi-label classification allows for multiple categories to be applied to a single dataset entry. A news article, for instance, could simultaneously be tagged as both "entertainment" and "sports," or it may belong solely to the "entertainment" category or any other.

Algorithms

The project presents four distinct methodologies for conducting multi-label text classification:

1. `classifier_multi_label`

Utilizes the BERT model's first token ([CLS]) vector of shape (batch_size, hidden_size).
Employs the tf.nn.sigmoid_cross_entropy_with_logits loss function.
Uses the tf.where function to select IDs with probabilities lower than 0.5.

Base Model

2. `classifier_multi_label_textcnn`

Uses BERT's output represented as a three-dimensional vector of shape (batch_size, sequence_length, hidden_size), which is then fed into a TextCNN layer.
Employs the tf.nn.sigmoid_cross_entropy_with_logits loss function.
Uses the tf.where function to select IDs with probabilities lower than 0.5.

TextCNN Model

3. `classifier_multi_label_denses`

Takes BERT's first token ([CLS]) vector of shape (batch_size, hidden_size) and passes it through multiple binary classifiers (fully connected layers) to solve the multi-label task.
Uses the tf.nn.softmax_cross_entropy_with_logits loss function.
Utilizes the tf.argmax function for selecting the highest probability output.

Denses Model

4. `classifier_multi_label_seq2seq`

Utilizes BERT's output as a three-dimensional vector of shape (batch_size, sequence_length, hidden_size), which is processed through an seq2seq+attention layer.
Employs the tf.nn.softmax_cross_entropy_with_logits loss function.
Uses beam search for decoding the output probabilities.

Seq2Seq Model

Experiments

Training Process

The project demonstrates a comprehensive training process to optimize model performance.

Training Process

Experiment Results

The experiments yield insightful results, showcasing the efficacy of different models under various parameters and conditions.

Experiment Results

Conclusions

For applications where inference speed is not critical, the ALBERT+Seq2Seq_Attention framework yields the best multi-label classification results.
For scenarios where both speed and model efficiency are crucial, the ALBERT+TextCNN framework emerges as an optimal choice.

References

The project builds on extensive research and previous work detailed in several articles focusing on multi-label text classification using ALBERT and various model structures. These references provide deeper insights and comparisons across different approaches.

This comprehensive overview captures the essence of the project, highlighting its methodologies, experiments, results, and conclusions, setting a foundation for further exploration and application in multi-label text classification.

Text Classification Multi-Label: An In-Depth Introduction

Introduction

Algorithms

1. classifier_multi_label

2. classifier_multi_label_textcnn

3. classifier_multi_label_denses

4. classifier_multi_label_seq2seq

Experiments

Training Process

Experiment Results

Conclusions

References

1. `classifier_multi_label`

2. `classifier_multi_label_textcnn`

3. `classifier_multi_label_denses`

4. `classifier_multi_label_seq2seq`