XPretrain
The repository showcases recent advancements in multi-modality learning by Microsoft's MSM group, emphasizing pre-training techniques. It features comprehensive video-language datasets and models, including HD-VILA, LF-VILA, and CLIP-ViP, as well as image-language models like Pixel-BERT and VisualParsing. In 2023, the CLIP-ViP model was accepted at ICLR, and LF-VILA was featured at NeurIPS 2022. Community contributions are welcomed under Microsoft's Open Source Code of Conduct.