openWakeWord - Wakeword Library for Integrating Voice Features in Applications

openWakeWord Project Introduction

openWakeWord is an open-source wakeword library designed to enhance voice-enabled applications and user interfaces. The library offers pre-trained models that recognize common words and phrases effectively, even in real-world scenarios.

Project Updates

Recent Enhancements

February 11, 2024: Version 0.6.0 has been released with numerous new features and improvements.
November 9, 2023: Example scripts added to demonstrate the streaming of audio from a web application.
October 11, 2023: Improvements in training new models have been made, including an example Google Colab notebook showcasing how to train a basic wake word model under an hour.

Online Demo

An online demo of the pre-trained models is available via HuggingFace Spaces. It is advised to conduct local installations for the most reliable testing experience since real-time microphone detection in Spaces might be inconsistent.

Installation and Setup

Installing openWakeWord is straightforward:

pip install openwakeword

For Linux systems, dependencies such as onnxruntime and tflite-runtime will be installed automatically. On Windows, support is limited to onnxruntime. Optional installation of Speex noise suppression is available for improved performance in noisy environments.

Using openWakeWord

For quick local testing, a sample script is included that facilitates streaming detection from a local microphone. To integrate openWakeWord into your own Python application, minimal code is needed:

import openwakeword
import openwakeword.utils
from openwakeword.model import Model

# Download models and initialize
openwakeword.utils.download_models()
model = Model()

# Process audio data
frame = my_function_to_get_audio_frame()
prediction = model.predict(frame)

openWakeWord also provides utility functions for analyzing audio files and bulk predictions.

Recommendations for Optimal Use

Noise Suppression & VAD

Noise Suppression: Speex can be enabled on Linux for improved performance in environments with consistent background noise.
Voice Activity Detection: Included VAD can be activated to reduce false detections in noisy settings.

Activation Thresholds

Default activation threshold is set to 0.5, but users should adjust based on their specific environment for optimal performance.

User-Specific Models

Custom verifier models can be trained for specific voices, reducing false activations at the cost of broader voice recognition.

Objectives of openWakeWord

The project aims to:

Be efficient and user-friendly enough for real-world applications.
Offer sufficient accuracy with minimal false rates.
Employ a straightforward model architecture and inference process.
Minimize the need for manual data collection.

Pre-Trained Models

Currently supporting English, openWakeWord includes several models for recognizing words like "alexa" and phrases such as "what's the weather." These models are robust across different accents and pronunciations.

Model Architecture

Models consist of three components: a pre-processing function, a shared feature extraction model, and a classification model. Together, these parts convert audio into recognizable patterns.

Training Custom Models

openWakeWord provides tools to train new models with simplified procedures, available in a Google Colab notebook for ease of use. This allows users to generate custom wake words or phrases for specific applications.

Language Support

Currently, openWakeWord supports only English due to the nature of the pre-trained text-to-speech models. Future extensions may include support for other languages as resources become available.

In summary, openWakeWord is a versatile and accessible tool for integrating effective wake word recognition into various voice-enabled platforms, with a focus on ease of use and extendability.