gluon-cv - Features Advanced Models for Computer Vision Research and Applications

Gluon CV Toolkit

GluonCV is a comprehensive deep learning toolkit that caters to the needs of engineers, researchers, and students interested in computer vision. It provides easy access to state-of-the-art (SOTA) deep learning models, serving as a valuable resource for quickly prototyping and implementing research ideas.

Key Features

Reproducibility: GluonCV offers training scripts that allow users to reproduce SOTA results as reported in numerous research papers.
Framework Support: It supports two popular deep learning frameworks, PyTorch and MXNet, making it versatile and accessible for many users.
Pre-trained Models: Users can access a vast collection of pre-trained models, thus saving time on training and facilitating faster iteration.
Simplified APIs: The toolkit comes with well-designed APIs, which significantly reduce the complexity of model implementation, making it easier even for beginners.
Community Support: A strong community backs GluonCV, ensuring regular updates, support, and contributions.

Supported Applications

GluonCV covers a wide range of computer vision applications:

Image Classification: Identify objects within images using over 50 models like ResNet and VGG.
Object Detection: Detect multiple objects with bounding boxes in images using models like Faster RCNN and YOLO-v3.
Semantic Segmentation: Label each pixel in an image with a category using models such as DeepLab-v3 and ICNet.
Instance Segmentation: Similar to semantic segmentation but differentiates between different objects, possible with Mask RCNN.
Pose Estimation: Detect human poses in images using the Simple Pose model.
Video Action Recognition: Recognize human actions in videos, supported by models like TSN and I3D in both MXNet and PyTorch.
Depth Prediction: Predict depth maps from images using the Monodepth2 model.
Generative Adversarial Networks (GANs): Create visually deceptive images with GAN models like StyleGAN.
Person Re-identification: Match pedestrian images across different scenes with models designed for re-identification tasks.

Installation Guide

GluonCV supports both MXNet and PyTorch, requiring the user to install the respective framework depending on their model choice. Installations can be done through pip, with both stable and nightly releases available for developers seeking the latest features and fixes.

For MXNet:

pip install gluoncv --upgrade
pip install -U --pre mxnet -f https://dist.mxnet.io/python/mkl  # for native support
pip install -U --pre mxnet -f https://dist.mxnet.io/python/cu102mkl  # for CUDA 10.2

For PyTorch:

pip install gluoncv --upgrade
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html  # for native support
pip install torch==1.6.0 torchvision==0.7.0  # for CUDA 10.2

Documentation and Learning Resources

Comprehensive documentation and tutorials are available at the GluonCV website to help users get started with various computer vision tasks. For those new to deep learning, the toolkit also recommends the open-source book Dive into Deep Learning, alongside additional resources for quick learning and advanced topics.

Community and Contributions

GluonCV has an active community of contributors constantly working to update and improve the toolkit. Users are encouraged to contribute to the project or offer feedback for continual enhancement.

The project not only simplifies the adoption of advanced computer vision techniques but also encourages experimentation and learning, making it a pivotal tool in the field of deep learning.