Youku-mPLUG
Youku-mPLUG is a 10 million video dataset, the largest available for Chinese video-language pre-training. It comprises high-quality, diverse, and safe content categorized into 20 super categories and 45 categories. The dataset facilitates video category prediction, video-text retrieval, and video captioning, providing benchmarks for pre-trained model evaluation in multimodal tasks. It is accessible for research with setup guidelines and model benchmarks to support AI development and fine-tuning.