en

#MIMDet

Utilizing Masked Image Modeling with a Vanilla ViT, this project enhances object detection and instance segmentation. A compact convolutional stem is integrated for multi-scale representation, forming a hybrid ViT-ConvNet backbone. It achieves significant results on COCO with 51.7 box AP and 46.2 mask AP, showcasing efficiency in training and accuracy in inference through varied sample ratios.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]