MNBVC
MNBVC is an expansive and continuously growing corpus of over 38344GB of Chinese texts spanning both mainstream and niche cultures. Aiming for 40TB, it covers various formats such as news, literature, and online discussions. The corpus employs data cleaning tools to improve usability and serves as a valuable resource for AI and NLP research. The project invites community involvement to assist in expanding and processing the data. Access the repository to download and utilize the structured data for diverse linguistic and cultural studies.