#text classification
spaCy
Explore spaCy's robust NLP platform supporting over 70 languages using state-of-the-art neural networks. Access pretrained pipelines for essential tasks like tokenization, named entity recognition, and text classification. Leverage multi-task learning with BERT transformers, ensuring easy deployment and production-readiness. Enhance projects with custom models in frameworks like PyTorch or TensorFlow, and utilize powerful visualizers for syntax and NER. This open-source software, under the MIT license, offers high accuracy and extensibility for all your NLP needs.
underthesea
Underthesea is an open-source toolkit offering accessible NLP tools for Vietnamese text, including word segmentation, POS tagging, sentiment analysis, and more. Its integration with prompt-based models and comprehensive datasets supports robust text processing in Python. The project invites contributions and collaboration from its community.
fastText
The fastText library offers efficient tools for learning word representations and text classification. It supports various language models, including state-of-the-art English vectors and multilingual models trained on Wikipedia, enabling language identification and supervised tasks. With build options for Make, CMake, and Python bindings, fastText integrates easily with Mac OS and Linux systems. It also supports text classification tasks and model compression for optimized memory usage, providing a valuable resource for developers in text processing.
Introduction-NLP
Explore Chinese NLP fundamentals through detailed explanations of key techniques including segmentation, POS tagging, and named entity recognition. Authored by HanLP creator Han He, this project translates complex models into digestible concepts, offering professional development opportunities with insightful personal notes. Access additional resources like mind maps and related project links for a comprehensive learning journey.
ktrain
ktrain is a user-friendly library designed to simplify machine learning with TensorFlow Keras. It facilitates easy deployment and training of models for text, vision, graph, and tabular data using pre-set models such as BERT and ResNet. Suitable for both beginners and experts, ktrain is ideal for simplifying deep learning processes with minimal coding needs. The library includes features for text classification, sequence labeling, and image classification, along with tools for determining optimal learning rates and employing advanced schedules. It also provides seamless model deployment options and export capabilities to ONNX and TensorFlow Lite. Generative question-answering tasks have transitioned to the OnPrem.LLM package in recent updates.
llama-classification
Discover a text classification framework using LLaMA with approaches like direct, channel, and pure generation. The repository provides Nvidia GPU setup details for optimized processing of the ag_news dataset, focusing on conditional probability and calibration methods to boost prediction accuracy. Engage with the community through issues or pull requests for enhancements, making it suitable for researchers and developers seeking a practical LLaMA classification solution.
pytextclassifier
PyTextClassifier is an open-source Python library designed for text classification and clustering applications, incorporating a diverse range of algorithms including Logistic Regression, Random Forest, Decision Tree, and advanced deep learning models like BERT and FastText. It supports sentiment analysis, risk classification, and other complex classification tasks such as binary, multi-class, multi-label, and hierarchical classifications. The library offers straightforward installation and usage for efficient model training, evaluation, and deployment, ensuring high performance and clarity with an emphasis on modular design and ease of use.
pyss3
PySS3 is an innovative Python package for text classification, utilizing the easy-to-understand SS3 model. It caters to researchers and developers by facilitating the deployment of interpretable machine learning solutions, backed by strong performance in CLEF's eRisk lab evaluations. Features include the primary SS3 model, t-SS3 for dynamic n-gram detection, and supportive tools like Live Test and Evaluation class to enhance model transparency and efficiency. Suited for projects prioritizing clarity and dependability in text classification.
prodigy-recipes
The repository contains versatile and customizable recipes for Prodigy, enabling scriptable annotations for text and images. A Prodigy license is required to access these scripts, which offer improvements and simplifications over the built-in recipes for easier understanding. It includes installation guidance, usage instructions, and script customization tips for tasks such as Named Entity Recognition and Image Annotation, along with community and tutorial recipes to enhance annotation workflows.
Feedback Email: [email protected]