hub - Efficient Integration of Models into PyTorch Hub

Introduction to PyTorch Hub

PyTorch Hub is a community-driven platform designed for sharing and discovering machine learning models with ease. It enables researchers and developers to contribute their models, which can then be accessed and utilized by others within the PyTorch ecosystem. With PyTorch Hub, users can benefit from the collaborative nature of the PyTorch community, leveraging a wide array of pre-built models to kickstart their own projects.

Submitting Models to PyTorch Hub

To get a model listed on PyTorch Hub, contributors must go through a structured submission process. Here's how it works:

Hubconf.py Addition: Contributors need to add a file named hubconf.py to their repository. This file serves as the configuration entry point for their model. Detailed instructions for writing hubconf.py are available on the PyTorch documentation site. The contributor should test this setup locally using the torch.hub.load(...) command to ensure it functions correctly.
Pull Request (PR) Creation: Once hubconf.py is set up, the next step is to create a Pull Request in the pytorch/hub repository on GitHub. For each model added, contributors must create a markdown file following a specific naming convention—<repo_owner>_<repo_name>_<title>.md—and using a provided template. This file will describe the model, serving as its showcase on the Hub.

Important Considerations

Pretrained Weights: At present, PyTorch Hub does not host pretrained weights directly. Contributors need to ensure these are hosted properly elsewhere if applicable.
Markdown Structure: Generally, it’s recommended to list one model per markdown file. However, models that share similar architectures, such as resnet18 and resnet50, can be grouped within the same file.
Images and Tags: Contributors can include images related to their model by placing them in the images/ directory and linking them correctly within the markdown file. Additionally, only a predetermined set of tags is supported, but contributors can submit requests to expand this list via PRs.

Testing and Previewing Submissions

To ensure everything is set up correctly, PyTorch Hub provides testing scripts. These scripts concatenate all Python code sections within a markdown file and execute them against the latest PyTorch release. If any dependencies are missing on the Continuous Integration (CI) machine, they can be added in the install.sh script.

Moreover, contributors can preview how their model’s page will appear using a netlify bot. This bot builds the PR with the latest repository and comments back with a preview link. Each update to the PR will refresh the preview, allowing contributors to see changes in real-time.

Conclusion

PyTorch Hub promotes collaboration and resource sharing within the PyTorch community by streamlining the process of publishing and accessing machine learning models. Its structured approach ensures a reliable, accessible, and uniform experience for developers around the globe, fostering innovation and rapid development in the field of machine learning.