wenet - 精确、轻便的语音识别解决方案

WeNet: An Overview

WeNet is an innovative and production-ready toolkit designed for end-to-end speech recognition. It is especially notable for its ease of use, accuracy, and lightweight design. A core principle of WeNet is to deliver robust production solutions by providing a comprehensive stack for speech recognition capabilities.

Key Features

Production-First Approach: WeNet is crafted with production environments in mind, ensuring that it is not only efficient for personal use but also highly suitable for commercial and industrial applications.
High Accuracy: The toolkit achieves state-of-the-art results on numerous public speech datasets, proving its reliability and effectiveness.
User-Friendly: With a design focused on simplicity, WeNet is easy to set up and comes with extensive documentation for users to get started quickly.

Installation Guide

Python Package Installation

To begin using WeNet, one can easily install the Python package via:

pip install git+https://github.com/wenet-e2e/wenet.git

For command-line usage, a simple command would be:

wenet --language chinese audio.wav

For those who prefer Python scripting, the integration is straightforward:

import wenet

model = wenet.load_model('chinese')
result = model.transcribe('audio.wav')
print(result['text'])

Detailed instructions on Python usage can be found here.

Installing for Training and Deployment

To install WeNet for deeper integration such as training and deployment, follow these steps:

Clone the Repository:

git clone https://github.com/wenet-e2e/wenet.git

Set Up Conda Environment:
- Install Conda by following the official guide.
- Create and activate a Conda environment:
```
conda create -n wenet python=3.10
conda activate wenet
conda install conda-forge::sox
```
Install CUDA and PyTorch:
- Install CUDA 12.1 following the link.
- Install specific versions of PyTorch and torchaudio:
```
pip install torch==2.2.2+cu121 torchaudio==2.2.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html
```

For those using Ascend NPU, detailed instructions are available, including the installation of CANN and torch-npu dependencies.

Install Additional Packages:

pip install -r requirements.txt
pre-commit install

Deployment

To utilize x86 runtime or language model (LM), build the runtime as indicated:

cd runtime/libtorch
mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .

Comprehensive documentation for building the runtime on various platforms is available here.

Community and Support

Engage with other WeNet users and developers through GitHub Issues. Chinese users can join the WeChat group for discussions and quicker support by scanning the provided QR codes on the WeNet page.

Acknowledgements

WeNet builds upon the noteworthy work of other projects. It incorporates components from ESPnet for transformer-based modeling, borrows from Kaldi for WFST-based decoding, references EESEN for TLG-based graph building, and leverages OpenTransformer for batch inference of models.

Citations

WeNet, as introduced in prestigious publications, has been detailed in works such as:

Yao, Zhuoyuan et al. "WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit." Proc. Interspeech, 2021.
Zhang, Binbin et al. "WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit." arXiv preprint arXiv:2203.15455, 2022.

For further exploration of the research underpinning WeNet, these citations provide a deeper understanding of its technological foundation.