april-asr - Minimal API for Offline Streaming Speech Recognition

Introduction to the april-asr Project

April-ASR is an innovative project designed to provide a straightforward library for offline streaming speech-to-text applications. The goal of April-ASR is to enable developers to seamlessly integrate speech recognition capabilities into their applications, even when there's no internet connection available. This minimal yet powerful library makes interactions with speech technologies more accessible and smoother for its users.

Current Status

As of now, April-ASR is still under active development. While it holds promise, users might encounter some unimplemented features, bugs, crashing issues, and potential changes in the API. The project is not yet deemed production-ready. Currently, it supports only one language model for English, which may have some limitations in accuracy.

Language Support

The library offers a C API and has bindings available for C# and Python developers. However, these bindings are still in their infancy and may not offer complete stability at this stage.

Usage Example

April-ASR provides users with runnable examples to demonstrate its capabilities. For instance, there's a sample called example.cpp that shows how to perform speech recognition on a wave (.wav) file, or even stream recognition via reading from standard input (stdin). To utilize this, users first build the library and then can run a command like:

./main /path/to/file.wav /path/to/model.april

For those interested in streaming recognition, audio can be piped into April-ASR using the parec command-line tool as shown:

parec --format=s16 --rate=16000 --channels=1 --latency-ms=100 | ./main - /path/to/model.april

Available Models

Currently, the English model is the only one available. This model is based on the icefall model initially trained by csukuangfj and refined with additional data. For those who wish to create custom models, there are guidelines provided in the extra/exporting-howto.md documentation.

Building on Linux

Building the April-ASR library on Linux requires ONNXRuntime v1.13.1. Users have the option to download the pre-built release binaries or attempt to build from source. The installation involves running scripts and setting environmental variables if necessary. Detailed instructions are provided in the project's documentation.

Building on Windows

For Windows users, the setup involves creating a 'lib' folder and downloading the ONNXRuntime binaries. After configuring the project with CMake and Visual Studio, they should ensure that the necessary DLLs such as onnxruntime.dll are available during runtime to avoid startup errors.

Applications

April-ASR is a versatile foundation for creating speech-based applications. One of the notable applications in progress is "Live Captions" – a Linux desktop app that delivers real-time captions for accessibility purposes.

Acknowledgements

April-ASR benefits tremendously from contributions within the open-source community. Specifically, thanks are extended to the developers behind k2-fsa/icefall for their recipes and model contributions. Furthermore, the project incorporates additional libraries such as pocketfft, the Sonic library, and tinycthread. The bindings are inspired by the Vosk API, another speech recognition project.

April-ASR stands as a promising tool in the realm of speech recognition, offering developers the flexibility and power to integrate speech functionalities into their offline systems. It continues to evolve, shaping a future where intelligent speech applications can operate seamlessly, even without a network connection.