Keras-TextClassification - Diverse Neural Network Architectures for Robust Text Classification

Overview of the Keras-TextClassification Project

Keras-TextClassification is an open-source project aimed at providing a comprehensive suite for text classification using various deep learning models. It leverages the power of Keras, a popular deep learning framework, to simplify the implementation of complex neural network architectures for text categorization tasks. This project is versatile and can be applied to a wide range of tasks, from sentiment analysis to topic classification.

Installation and Setup

Setting up Keras-TextClassification is straightforward. Users can install it via Python's package manager, pip:

pip install Keras-TextClassification

After installation, users need to download and extract the dataset, which is typically done by downloading a file named 'data.rar' from a specified link and unzipping it to a specified directory in the Anaconda environment where the package is installed.

Supported Models

Keras-TextClassification supports various text classification models. These models cater to different needs and complexities, ranging from simple architectures to state-of-the-art frameworks:

Electra (in progress)
Albert
XLNet
Bert
FastText
TextCNN
charCNN
TextRNN
TextRCNN
TextDCNN
TextDPCNN
TextVDCNN
TextCRNN
DeepMoji
SelfAttention
HAN (Hierarchical Attention Networks)
CapsuleNet
Transformer-encode
SWEM (Simple Word-Embedding-based Models)
LEAM (Label Embedding Attentive Model)
TextGCN (in progress)

Running the Models

To run a specific model, users navigate to the corresponding directory, such as keras_textclassification/m01_FastText for the FastText model. Training and prediction scripts are available and can be executed using Python commands:

For training, use:

python train.py

For making predictions, use:

python predict.py

Data and Pre-trained Models

The project provides access to various datasets and pre-trained models that enhance the text classification tasks. It includes datasets like Baidu QA, which consists of questions and answers, and Byte Multi News for multi-label news categorization. Pre-trained embeddings from models like Bert, Albert, and XLNet are also incorporated.

Project Structure and Components

Keras-TextClassification is thoughtfully structured to provide both flexibility and functionality:

Base Classes: Fundamental classes that handle network graphs and embeddings provide a foundation for building various model architectures.
Layers: The keras_layers directory contains custom layers frequently used across different models.
Configuration and Data Management: The project includes conf for model and data paths, and data for processing corpora and model storage.

References to Research Papers

Developers can delve deeper into the theoretical underpinnings of each model through various research papers linked within the project's documentation. Each model, from FastText to XLNet, is accompanied by its corresponding academic paper, serving as both a reference and an inspiration.

Community and Contribution

The project acknowledges various community contributors who have inspired or directly contributed to its development. It also references similar projects to encourage collaboration and learning.

Keras-TextClassification is a rich resource for anyone looking to explore text classification through deep learning, offering comprehensive tools and models well-suited for both beginners and advanced users alike.