en

#cross-modality

Explore a unified sequence-to-sequence model that supports cross-modality tasks such as image captioning, VQA, and text classification with state-of-the-art pretraining and finetuning. This model offers multi-language support and interactive online demos, with resources for transformers integration, demonstrating strong performance in COCO and VQA challenges.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]