chinese_speech_pretrain
This project uses extensive Chinese audio data from sources like YouTube and Podcasts to train models such as wav2vec 2.0 and HuBERT via Fairseq. These models, available in BASE and LARGE versions, enhance speech recognition and are evaluated on datasets like Aishell and WenetSpeech. Accessible on Hugging Face, these models are suitable for diverse applications, showing improved performance in varied noise and recording settings.