Introduction to the Small-Text Project
Small-Text is a Python library designed for efficient active learning in text classification. Active learning is a machine learning paradigm where the algorithm interactively queries a human annotator to label data points with the desired outputs. This project brings state-of-the-art tools to streamline this process, making text classification more accessible and adaptable to various applications.
Features of Small-Text
Small-Text is packed with features that make it an essential tool for anyone looking to optimize their text classification tasks:
-
Unified Interfaces: The library provides a consistent interface for active learning across different machine learning frameworks such as scikit-learn, Pytorch, and Hugging Face's Transformers. This allows users to easily integrate different query strategies and classifiers into their workflows.
-
Support for GPUs and Transformers: The library supports GPU-based models using Pytorch, which enhances the performance and speed of processing. It also integrates with Transformers, allowing the use of advanced text classification models alongside active learning strategies.
-
Flexible Installation Options: For users with limited hardware resources, Small-Text can be installed with minimal dependencies, running effectively even without a GPU. More comprehensive installations are also available for users who wish to leverage the full power of the tool.
-
Pre-Implemented Strategies: Small-Text comes with a suite of pre-implemented components, including query strategies, initialization strategies, and stopping criteria, all of which have been scientifically evaluated.
Understanding Active Learning
Active learning is particularly valuable in scenarios where labeled data is scarce. It helps efficiently label training data for supervised learning, leveraging the limited available data to maximize learning efficiency.
Recent Updates
Small-Text is actively maintained and continuously improved. Some of the recent updates include:
- Version 1.4.1: Released in August 2024, this version is a bugfix release.
- Version 1.4.0: Released in June 2024, introduced a new query strategy named AnchorSubsampling (AnchorAL).
Additionally, the project has been recognized with a paper presented at EACL 2023, which was awarded the Best System Demonstration.
Getting Started
Installing Small-Text is straightforward. You can install it via pip:
pip install small-text
To access additional features with transformers, you can use:
pip install small-text[transformers]
For conda users, a full installation includes:
conda install -c conda-forge "torch>=1.6.0" "torchtext>=0.7.0" transformers small-text
More detailed installation instructions and quick start guides are available in the project documentation.
Resources and Contributions
The project is open to contributions and welcomes community involvement. The comprehensive documentation provides in-depth guides and showcases, including tutorials on using Small-Text for various text classification scenarios.
For further exploration and to contribute to its development, the community can find guidelines in the CONTRIBUTING.md file.
Acknowledgments
Small-Text was developed by Christopher Schröder and the NLP group at Leipzig University, part of the Webis research network. The project has been supported by the Development Bank of Saxony.
Small-Text is licensed under the MIT License, and users are encouraged to cite its paper for academic and research purposes.
For detailed information and updates, users can refer to the documentation.