espnet_onnx - Streamlined Export, Quantization, and Optimization of ESPnet Models to ONNX Format

Introduction to espnet_onnx

Overview

The espnet_onnx project offers a utility library designed to simplify the export, quantization, and optimization of ESPnet models to the ONNX format. This process eliminates the need to have PyTorch or ESPnet installed on a machine, provided that the necessary files have already been exported.

Demos Available in Google Colab

For those looking to try out espnet_onnx, demonstration notebooks are available:

Installation

Pip Installation: Install espnet_onnx easily using pip:
```
pip install espnet_onnx
```
Export Pretrained Models: For exporting models, install additional packages: torch (>=1.11.0), espnet, espnet_model_zoo, and onnx. Note that onnx==1.12.0 might cause errors, and downgrading may be necessary.

Developer Installation Guide

For developers looking to further evolve this project, here’s an outline on setting it up:

Clone the espnet_onnx repository:

git clone [email protected]:espnet/espnet_onnx.git

Set up a virtual environment:
```
cd tools
make venv export
```
Activate the virtual environment and install torch if needed:
```
. tools/venv/bin/activate
pip install torch
```

Clone and install the s3prl repository:

git clone https://github.com/s3prl/s3prl
cd s3prl
pip install .

If developing transducer models or optimizations, additional installations for warp-transducer and onnxruntime are necessary.

Usage

Export Models

Espnet_onnx allows exporting of pretrained models from the espnet_model_zoo:

Export directly using espnet_model_zoo.
Export models from zipped files containing meta.yaml.
Customize configurations such as maximum sequence length for exporting.

Inference

Load exported ONNX models using tag_name or model_dir.
Perform streaming ASR using StreamingSpeech2Text.
Utilize various optimization techniques to improve model performance.

Text2Speech Inference

Export TTS models in the same way as ASR models.
Generate waveform output using the provided Text2Speech class.

GPU Utilization

Enhance inference speed using GPU with the onnxruntime-gpu library by configuring providers such as CUDAExecutionProvider.

Differences from ESPNet

Several modifications differentiate espnet_onnx from the original ESPNet implementation to address cache issues and enable ONNX conversion capability.

Supported Architectures

Espnet_onnx supports various architectures for both ASR and TTS, detailed in their respective documentation.

Developer and Reference Documentation

For developers seeking a deeper understanding or wanting to contribute, espnet_onnx provides comprehensive guides and references to the ESPNet toolkit and ESPNet Model Zoo.

License

Espnet_onnx is released under the MIT License, ensuring open collaboration and distribution.

Contact

The project is spearheaded by Masao Someki, who can be reached at [email protected].

This introduction aims to offer a clear understanding of what espnet_onnx provides for users and developers alike. Interested individuals are encouraged to explore the project's capabilities through the provided demos and documentation.