Parameter-Efficient-Transfer-Learning-Benchmark - Improve parameter-efficient transfer learning in computer vision with varied datasets and key algorithms

A Unified Visual Parameter-Efficient Transfer Learning Benchmark (V-PETL Bench)

Introduction

The Parameter-Efficient Transfer Learning Benchmark (V-PETL Bench) is a comprehensive project designed to evaluate parameter-efficient transfer learning (PETL) methods in the field of computer vision (CV). PETL methods are innovative techniques that allow existing pre-trained models to adapt to new tasks by modifying only a small number of parameters, which could lead to more efficient use of computational resources and faster adaptations to various downstream tasks.

In the domain of computer vision, multiple PETL algorithms have been suggested, yet utilizing or comparing these methods has not been straightforward. To tackle this issue, V-PETL Bench offers a unified and systematic benchmark. This benchmark includes 30 varied and challenging datasets spanning tasks such as image recognition, video action recognition, and dense prediction. To ensure a robust evaluation, the project assesses 25 predominant PETL algorithms and provides an open-source, modular, and extensible codebase.

Getting Started

Data Preparation

Image Classification Datasets: Includes datasets for tasks like fine-grained visual classification and the Visual Task Adaptation Benchmark (VTAB), which features datasets such as CUB-200, NABirds, Stanford Dogs, and Stanford Cars.
Video Action Recognition Datasets: Features datasets like Kinetics-400 and Something-Something V2, which test the ability of models to classify actions within videos.
Dense Prediction Datasets: Includes well-known datasets such as MS-COCO, ADE20K, and PASCAL VOC, which are critical for tasks requiring pixel-level classification.

Pre-trained Model Preparation

The V-PETL Bench supports models pre-trained on tasks such as ImageNet and video datasets like Kinetics-400. Models like ViT (Vision Transformer) and Swin Transformer are used as backbone architectures. Pre-trained models can be downloaded and organized in a specified directory for easy use in experiments.

Structure of the V-PETL Bench

The project's structure includes key directories such as ImageClassification/configs for experiment configurations, dataloader for input data handling, models containing backbones and tuning methods, and train for training scripts.

Quick Start

To set up and use the V-PETL Bench locally, users should clone the repository and set up the environment using Python and essential libraries like PyTorch and torchvision. Training and evaluation of models can be managed using the provided scripts, allowing for custom experiments and assessments.

Results and Checkpoints

Results from the benchmark include evaluations of 13 PETL algorithms across several datasets with ViT-B/16 models pre-trained on ImageNet-21K. The benchmark provides insights into efficiency and performance, helping researchers select suitable PETL methods for their applications.

The project underscores the value of parameter efficiency in transfer learning, aiming to support advancements in machine learning applications by facilitating the evaluation and comparison of various PETL approaches. Researchers and developers can continuously benefit from V-PETL Bench's evolving repository of algorithms and datasets.