#Vision Transformers
Awesome-MIM
This project delivers an extensive review of Masked Image Modeling (MIM) and associated techniques in self-supervised representation learning, presenting them in their historical sequence of development. It covers essential topics such as MIM for Transformers, contrastive learning, and applications in various modalities. The analysis includes the progression of self-supervised learning across diverse modalities, underscoring its pivotal role since 2018 in areas like NLP and Computer Vision. Contributions and revisions from the community are welcomed, along with resources such as curated paper lists and formats for academic citations. This is an essential resource for researchers and enthusiasts exploring the developments and practical applications in MIM.
RepViT
RepViT-SAM addresses computational challenges in mobile vision tasks by replacing conventional image encoders with advanced RepViT models, enhancing segmentation speed and efficiency on devices such as iPhones. With this approach, RepViT-SAM achieves impressive zero-shot transfer performance and up to ten times faster inference. Leveraging state-of-the-art ViT and CNN integrations, the RepViT family sets a new standard in lightweight model performance, boasting over 80% top-1 accuracy on ImageNet while maintaining low latency.
Feedback Email: [email protected]