Introduction to espnet_onnx
Overview
The espnet_onnx project offers a utility library designed to simplify the export, quantization, and optimization of ESPnet models to the ONNX format. This process eliminates the need to have PyTorch or ESPnet installed on a machine, provided that the necessary files have already been exported.
Demos Available in Google Colab
For those looking to try out espnet_onnx, demonstration notebooks are available:
Installation
-
Pip Installation: Install espnet_onnx easily using pip:
pip install espnet_onnx
-
Export Pretrained Models: For exporting models, install additional packages:
torch
(>=1.11.0),espnet
,espnet_model_zoo
, andonnx
. Note thatonnx==1.12.0
might cause errors, and downgrading may be necessary.
Developer Installation Guide
For developers looking to further evolve this project, here’s an outline on setting it up:
-
Clone the espnet_onnx repository:
git clone [email protected]:espnet/espnet_onnx.git
-
Set up a virtual environment:
cd tools make venv export
-
Activate the virtual environment and install torch if needed:
. tools/venv/bin/activate pip install torch
-
Clone and install the s3prl repository:
git clone https://github.com/s3prl/s3prl cd s3prl pip install .
-
If developing transducer models or optimizations, additional installations for
warp-transducer
andonnxruntime
are necessary.
Usage
Export Models
Espnet_onnx allows exporting of pretrained models from the espnet_model_zoo:
- Export directly using espnet_model_zoo.
- Export models from zipped files containing
meta.yaml
. - Customize configurations such as maximum sequence length for exporting.
Inference
- Load exported ONNX models using
tag_name
ormodel_dir
. - Perform streaming ASR using
StreamingSpeech2Text
. - Utilize various optimization techniques to improve model performance.
Text2Speech Inference
- Export TTS models in the same way as ASR models.
- Generate waveform output using the provided Text2Speech class.
GPU Utilization
- Enhance inference speed using GPU with the
onnxruntime-gpu
library by configuring providers such asCUDAExecutionProvider
.
Differences from ESPNet
Several modifications differentiate espnet_onnx from the original ESPNet implementation to address cache issues and enable ONNX conversion capability.
Supported Architectures
Espnet_onnx supports various architectures for both ASR and TTS, detailed in their respective documentation.
Developer and Reference Documentation
For developers seeking a deeper understanding or wanting to contribute, espnet_onnx provides comprehensive guides and references to the ESPNet toolkit and ESPNet Model Zoo.
License
Espnet_onnx is released under the MIT License, ensuring open collaboration and distribution.
Contact
The project is spearheaded by Masao Someki, who can be reached at [email protected]
.
This introduction aims to offer a clear understanding of what espnet_onnx provides for users and developers alike. Interested individuals are encouraged to explore the project's capabilities through the provided demos and documentation.