wit
The WIT dataset offers a vast collection of 37.6 million image-text examples sourced from 108 languages on Wikipedia, optimized for pretraining multimodal machine learning models. Its strengths include broad multilingual support, detailed metadata, and demanding real-world evaluations. The dataset facilitates advancements in multilingual and multimodal research by using images as a universal medium to bridge language barriers, enhancing text comprehension across languages. WIT is widely recognized in research circles and is available for download.