SenseVoice
SenseVoice is a speech foundation model providing capabilities in automatic speech recognition, speech emotion recognition, and audio event detection across 50+ languages. It delivers high precision in multilingual recognition, outperforming many leading models. The non-autoregressive framework offers significantly faster audio processing, up to 15 times quicker than comparable models. With flexible finetuning and versatile deployment options, the model meets varied business and technical requirements. Recent enhancements include ONNX and libtorch export features, improving integration and usability.