WanJuan1.0
Intern · WanJuan 1.0 provides a comprehensive and open-source multimodal corpus including text, image-text, and video datasets with a total volume exceeding 2TB. Created by Shanghai AI Lab with rigorous data fine-tuning processes, this dataset ensures high quality, seamless integration, and alignment with Chinese values. It encompasses various domains like science and law, enhancing AI models' logical reasoning and generalization capabilities. Optimized for usability and efficiency, this dataset supports training for Multimodal Large Language Models, excelling in tasks like semantic interpretation and visual analysis.