HorNet - Explore Recursive Gated Convolutions for Enhanced Spatial Interactions

HorNet: An Overview

HorNet is an innovative project centered around developing a family of vision backbones, designed to enhance image processing through advanced technological approaches. The project was developed by a team of experts including Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser-Nam Lim, and Jiwen Lu. This initiative was presented at NeurIPS 2022 and focuses on implementing Recursive Gated Convolution to facilitate high-order spatial interactions.

Key Features of HorNet

HorNet serves as a set of foundational models for processing visual information. The standout feature of these models is their use of explicit high-order spatial interactions, made possible through the novel Recursive Gated Convolution. This approach ensures more efficient and accurate visual data analysis.

HorNet Introduction

Model Zoo

HorNet offers several pre-trained models, each catering to different needs based on parameters like the number of model parameters (Params), computational complexity (FLOPs), and achieved accuracy (Top-1 score):

HorNet-Tiny
- 7x7: 22M Params, 4.0G FLOPs, 82.8% Top-1 accuracy
- GF variant: 23M Params, 3.9G FLOPs, 83.0% Top-1 accuracy
HorNet-Small
- 7x7: 50M Params, 8.8G FLOPs, 83.8% Top-1 accuracy
- GF variant: 50M Params, 8.7G FLOPs, 84.0% Top-1 accuracy
HorNet-Base
- 7x7: 87M Params, 15.6G FLOPs, 84.2% Top-1 accuracy
- GF variant: 88M Params, 15.5G FLOPs, 84.3% Top-1 accuracy

For those requiring more powerful tools, models trained on ImageNet-22K are available, such as HorNet-Large with variations based on resolution and parameter settings.

ImageNet Classification

HorNet models are designed to perform well on general image classification tasks. The repository includes support for both training and evaluation on the well-known ImageNet dataset. A detailed guide for setting up the environment and dataset ensures that users can easily replicate the models' results.

Requirements

To work with HorNet, the following software versions are recommended:

PyTorch 1.8.0
Torchvision 0.9.0
Timm 0.4.12

Additional tools like TensorBoardX, Six, and Submitit are useful for multi-node training setups.

Data Preparation

Properly organizing the ImageNet dataset is crucial for successful training and evaluation. Users must structure the data with separate directories for training and validation images.

Evaluation and Training

Step-by-step instructions are provided for evaluating pre-trained models or training from scratch using HorNet's configurations. These processes utilize distributed computing to optimize performance.

Downstream Tasks

HorNet is not just limited to standard image classification but also shows promising results in dense prediction tasks such as object detection and semantic segmentation. It achieves state-of-the-art performance in 3D object classification by applying its framework to point cloud data.

Acknowledgements and Licensing

The development of HorNet was supported by contributions from multiple pieces of software and resources, clearly acknowledged on the project's page. It is released under the MIT License, encouraging broad adaptation and use in academic and professional settings.

Getting Started with HorNet

For researchers and developers interested in implementing these models, detailed tutorials and training commands are available through the project's resources. Additionally, for those utilizing this work in their research, citation details are provided to ensure proper accreditation.

With its robust architecture and adaptability across varied visual tasks, HorNet sets a contemporary standard for efficiency and innovation in the realm of visual data processing.