wespeaker - Comprehensive Solutions for Speaker Embedding and Verification

WeSpeaker: A Comprehensive Speaker Embedding Learning Toolkit

WeSpeaker is an advanced project dedicated to speaker embedding learning, primarily focusing on applications in speaker verification. The toolkit is designed to be user-friendly and adaptable for both research and production environments. It provides robust support for online feature extraction or the use of pre-extracted features in the widely-used Kaldi format.

Installation and Usage

Installing the Python Package

The installation process for the WeSpeaker toolkit is straightforward. To get started, users need to install the package using pip, as shown in the following command:

pip install git+https://github.com/wenet-e2e/wespeaker.git

Utilizing Command-Line Interface (CLI)

WeSpeaker offers a versatile command-line interface, allowing users to execute various tasks such as embedding extraction, similarity computation, and diarization. Here's a quick guide on how to use it:

To extract embeddings from an audio file:

wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt

For comparing similarities between two audio files:

wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav

Using the Python Programming Interface

For those who prefer programming in Python, WeSpeaker provides an API for seamless integration into Python projects. Here’s an example of how to use it:

import wespeaker

model = wespeaker.load_model('chinese')
embedding = model.extract_embedding('audio.wav')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')

Installation for Development & Deployment

For users looking to contribute to the WeSpeaker project or deploy it in more customized environments, creating a dedicated Conda environment is recommended. The recommended PyTorch version is 1.12.1 or later.

git clone https://github.com/wenet-e2e/wespeaker.git
conda create -n wespeaker python=3.9
conda activate wespeaker
pip install -r requirements.txt

Latest Updates and Features

WeSpeaker is continuously evolving with regular updates. Notable recent features include:

SimAM_ResNet Support and pre-trained models on VoxBlink2 for superior performance.
Integration of whisper_encoder based frontend with the Whisper-PMFA framework.
Enhanced diarization capabilities with dimensionality reduction using UMAP and clustering with HDBSCAN.

Recipes

WeSpeaker offers various predefined recipes for processing popular datasets:

VoxCeleb: Provides advanced techniques for speaker verification, including self-supervised learning and score calibration.
CNCeleb: Impressive performance improvements with ResNet and fine-tuning methods.
NIST SRE16: Scripted recipes for specific datasets to achieve competitive error rates.

Community and Contributions

WeSpeaker boasts a strong community. Chinese users can join the WeNet Community via WeChat for direct interaction. Furthermore, WeSpeaker invites contributions from developers interested in advancing speaker verification technologies.

Citations

If WeSpeaker has aided your research or production, acknowledgment through citation is appreciated, with detailed entries provided for relevant conferences and journals.

In summary, WeSpeaker represents a significant leap forward in speaker embedding technologies, offering an accessible and powerful suite of tools for both academia and industry. The project encourages an inclusive and contributive community, constantly pushing boundaries in the realm of speaker verification and beyond.