Semi-Supervised Learning: A Unified Benchmark
Introduction
The "USB: Unified Semi-supervised learning Benchmark" is an open-source Pytorch-based Python package designed for the Semi-Supervised Learning (SSL) community. It's user-friendly, cost-effective, and comprehensive, aimed at simplifying and enhancing the process of developing and evaluating SSL algorithms across fields like computer vision (CV), natural language processing (NLP), and audio classification. The package boasts implementations of 14 SSL algorithms that utilize consistency regularization and offers 15 diverse tasks for evaluation.
News and Updates
USB is continually evolving with improvements and new features:
- In March 2024, three new algorithms, EPASS, SequenceMatch, and ReFixMatch, were added.
- In July 2023, the DeFixmatch algorithm was added, and several bugs were fixed.
- In June 2023, USB became a part of the Pytorch ecosystem.
- In January 2023, the USB was updated to semilearn==0.3.0, adding FreeMatch and SoftMatch algorithms and various imbalanced algorithms.
Getting Started
To start using USB, users need to ensure they have Pytorch, along with related libraries like torchvision, torchaudio, and transformers installed. The setup process includes creating a conda environment, followed by installation of necessary packages using pip.
Installation
USB can be swiftly installed using the Python package semilearn
. Users interested in customization or contributing can clone the repository from its GitHub page. For large-scale production or deployment, USB can be containerized using Docker, allowing for GPU utilization and flexible configuration adjustments.
Usage
USB simplifies the use and development of SSL algorithms. Users are encouraged to explore through simple examples to familiarize themselves with evaluating existing algorithms or creating new ones. USB also supports Docker for a streamlined development environment setup.
Benchmark Results
The benchmark results for USB are available and detail the performance across various tasks, providing insights into the effectiveness of different SSL algorithms.
Model Zoo and Development
USB aims to expand its collection of pre-trained models and offers a comprehensive guide for developers to integrate their own SSL algorithms into the framework.
Contributing and Community
The project is active and welcomes contributions. Those interested can fork the project on GitHub, develop features, and submit pull requests. The project adheres to Microsoft's Open Source Code of Conduct, providing a welcoming community for collaboration.
Acknowledgments
USB has drawn inspiration and resources from several existing projects including TorchSSL, FixMatch, CoMatch, and SimMatch, as well as platforms like HuggingFace and PyTorch Lightning.
In summary, USB represents a robust, community-driven effort to advance semi-supervised learning by providing tools that lower barriers to entry and promote innovation in AI and machine learning across multiple domains.