AdaSeq: A Comprehensive Library for Sequence Understanding Models
Introduction
AdaSeq, developed by Alibaba Damo Academy, is a versatile and user-friendly library designed to empower researchers and developers in building advanced models that comprehend various sequence data. It's hosted on ModelScope, a platform known for model development, and supports an array of tasks such as part-of-speech tagging, named entity recognition, entity typing, and relation extraction.
Key Features
Plentiful Models: AdaSeq provides a wide range of cutting-edge models and training methods. This makes it a rich resource for anyone working on sequence understanding tasks.
State-of-the-Art Performance: The library aims at delivering top-tier implementations that outperform other existing frameworks, enhancing the performance of the models significantly.
User-Friendly: With a simple command, users can access the best-performing models without the need for complex configurations, making the library extremely accessible.
Extensible: AdaSeq is designed to be flexible. Users can register new modules or assemble predefined ones to create custom sequence understanding models, broadening its applicability across different domains and tasks.
Important Updates
- AdaSeq's team received the Best Paper Award at SemEval 2023 for their U-RaNER paper.
- They achieved first place in nine tracks at SemEval 2023 for the task of Multilingual Complex Named Entity Recognition.
- The library includes several performance-enhancing models presented at EMNLP 2022, such as the Retrieval-augmented Multimodal Entity Understanding Model (MoRe) and the Unsupervised Boundary-Aware Language Model (BABERT).
Quick Experience
AdaSeq offers online demos that allow users to experience their models, such as English and Chinese Named Entity Recognition (NER) and Chinese Word Segmentation (CWS). These demos are accessible via ModelScope, showcasing the library's capabilities across different languages and domains.
Model and Dataset Zoo
AdaSeq houses a range of supported models including:
- Transformer-based CRF
- Retrieval Augmented NER
- Biaffine NER
- Multi-label Entity Typing
The library also maintains a collection of datasets specifically gathered for sequence understanding tasks, providing a substantial foundation for model training and evaluation.
Installation
AdaSeq requires Python 3.7 or later, PyTorch 1.8 or higher, and ModelScope 1.4 or newer for smooth operation. It can be installed easily via pip or from source for those who prefer direct integration.
Installation via pip:
pip install adaseq
Installation from source:
git clone https://github.com/modelscope/adaseq.git
cd adaseq
pip install -r requirements.txt -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
Tutorials and Learning Resources
AdaSeq provides a series of tutorials catering to both beginners and advanced users, covering topics from basic setup instructions to advanced techniques like hyperparameter optimization and multi-GPU training. These resources are designed to enhance user experience and improve their ability to utilize the library efficiently.
Contributing and License
The AdaSeq project encourages contributions from the community to continually enhance its features and usability. The project is open-source, licensed under the Apache License Version 2.0, ensuring broad access and collaborative development opportunities.