Project Icon

MIMDet

Optimizing Object Detection through Vision Transformers and Masked Image Modeling

Product DescriptionUtilizing Masked Image Modeling with a Vanilla ViT, this project enhances object detection and instance segmentation. A compact convolutional stem is integrated for multi-scale representation, forming a hybrid ViT-ConvNet backbone. It achieves significant results on COCO with 51.7 box AP and 46.2 mask AP, showcasing efficiency in training and accuracy in inference through varied sample ratios.
Project Details