FCOS - Efficient and Accurate Anchor-Free Object Detection

FCOS: Fully Convolutional One-Stage Object Detection

Overview

FCOS (Fully Convolutional One-Stage Object Detection) is an innovative approach to object detection developed by Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Presented initially at the International Conference on Computer Vision (ICCV) in 2019, this project represents a significant leap in simplifying and enhancing object detection technology. FCOS is noteworthy for being a single-stage detector that is completely free of anchor boxes, which are often a source of complexity and inefficiency in traditional object detection methods.

Key Highlights

Anchor-Free Design: FCOS does away with anchor boxes and associated hyper-parameters. This simplification addresses the inefficiencies and complications experienced with previous models that relied heavily on anchors for detecting bounding boxes in images.
Superior Performance: Benchmarked against Faster R-CNN, FCOS offers superior performance. For instance, with a ResNet-50 backbone, FCOS achieves an Average Precision (AP) of 38.7, outperforming Faster R-CNN's 36.8.
Efficiency in Training and Testing: FCOS cuts down on both training and inference times. Using the same hardware with a ResNet-50-FPN backbone, FCOS requires significantly less training time (6.5 hours versus 8.8 hours for Faster R-CNN) and decreases inference time per image by 12 milliseconds (44ms versus 56ms).
State-of-the-Art Results: The top-performing FCOS model, powered by ResNeXt-64x4d-101 and deformable convolutions, achieves an impressive AP of 49.0% on the COCO test-dev dataset.

Recent Updates

The project continually evolves with enhancements such as the integration of Fast And Diverse (FAD) neural architecture search, faster inference with new Non-Maximum Suppression (NMS) methods, and better performance models that achieve up to 49.0% AP using multi-scale testing.

System Requirements

Running FCOS efficiently requires high computational power. For optimal performance, the use of 8 Nvidia V100 GPUs is recommended, although training is feasible with four 1080Ti GPUs due to FCOS's memory-efficient nature.

Installation & Quick Demo

For those interested in testing FCOS, installation can be performed via pip. A testing-only installation option is available for those simply looking to utilize the FCOS object detector without delving into development. For a quick demo, users can follow a straightforward setup to witness FCOS in action, detecting objects in images with pre-trained models.

Available Models

FCOS offers a variety of models with differing configurations to cater to different needs. These include models based on ResNe(x)ts and MobileNets, each tuned to offer balance between speed and accuracy. The choice of model may depend on the specific requirements for the scale of training and the desired precision.

Training and Inference

Training with FCOS is streamlined with commands for synchronous distributed training on multiple GPUs. The training setup is flexible, allowing adjustments in GPU number, backbone configurations, and data handling to suit a user's specific dataset.

For inference, FCOS provides efficient handling of image analysis with multi-GPU support and optimizations for better processing speeds.

Contributions & Licensing

The project welcomes contributions and enhancements from the community to further improve its capabilities. For academic purposes, FCOS is available under the 2-clause BSD License. Commercial inquiries can be directed to the project authors for appropriate licensing.

Conclusion

FCOS represents a major shift towards a simpler, faster, and more efficient methodology for object detection. By harnessing the power of a fully convolutional design, it sets new standards in accuracy and performance in the field of computer vision.