MindNLP: A Comprehensive NLP Library
MindNLP is an open-source library designed for natural language processing (NLP) tasks, built on the MindSpore framework. It offers solutions for a variety of NLP challenges, streamlining the model building and training processes for researchers and developers.
Key Features
-
Data Processing: MindNLP packages several classical NLP datasets like Multi30k, SQuAD, and CoNLL into user-friendly modules to ease data handling.
-
Model Customization: With a wide array of configurable components, users can easily customize models. MindNLP acts as a versatile toolset for NLP models.
-
Simplified Training: It simplifies the training process by providing
Trainer
andEvaluator
interfaces, which facilitate easy model training and evaluation.
Latest Features and Updates
MindNLP has introduced a variety of advanced features:
-
Pretrained Models: Over 250 pretrained models support APIs similar to Huggingface transformers, simplifying their use. For instance:
from mindnlp.transformers import AutoModel model = AutoModel.from_pretrained('bert-base-cased')
-
Platform Compatibility: MindNLP supports several platforms including Ascend 910 series, Ascend 310B (Orange Pi), GPU, and CPU.
-
Distributed Inference: It offers multi-device and multi-process parallel inference for models with more than 10 billion parameters.
-
Quantization: Supports SmoothQuant on Orange Pi and bitsandbytes-like int8 quantization on the GPU.
-
Sentence Transformers: Facilitates efficient development of Retrieval-Augmented Generation (RAG).
-
Dynamic Graph Optimization: Achieves PyTorch+GPU-level speeds for dynamic graphs on Ascend hardware, with tested performance of 85ms per token with Llama.
-
Static and Dynamic Graph Unification: Users can switch seamlessly to graph mode with
mindspore.jit
, enhancing performance while maintaining compatibility with Hugging Face code style. -
LLM Applications: Encompasses numerous applications like text information extraction, chatbots, speech recognition, and more.
Installation
There are multiple ways to install MindNLP:
-
From PyPI: Install the official version using pip.
pip install mindnlp
-
Daily Build: Access the latest daily builds here.
-
From Source: Clone the repository and build from source.
git clone https://github.com/mindspore-lab/mindnlp.git cd mindnlp bash scripts/build_and_reinstall.sh
Supported Models
MindNLP supports an extensive range of models. For a detailed list, visit their website.
Contribution and Contact
MindNLP is an evolving project. The team encourages feedback and suggestions for new features. Users can reach out via Github Issues.
Acknowledgements
MindNLP is part of MindSpore’s open-source initiative. It invites contributions and feedback from the community, aiming to support research by providing a standardized toolkit for NLP.
Citation
Researchers using MindNLP are encouraged to cite the project:
@misc{mindnlp2022,
title={{MindNLP}: Easy-to-use and high-performance NLP and LLM framework based on MindSpore},
author={MindNLP Contributors},
howpublished = {\url{https://github.com/mindlab-ai/mindnlp}},
year={2022}
}
MindNLP represents a strategic move to tool mindfulness in NLP research and application, fostering advancements and innovation in the field.