nix-tts - Optimize Text-to-Speech with Nix-TTS Using Modular Knowledge Distillation

Introduction to Nix-TTS

Overview

Nix-TTS is a cutting-edge project focused on creating a highly efficient, lightweight text-to-speech (TTS) system. Developed by researchers Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, and Sakriani Sakti, this project introduces a new approach in the realm of TTS technology by leveraging a technique known as knowledge distillation.

The Technology Behind Nix-TTS

At its core, Nix-TTS is designed to streamline the traditional TTS systems, achieving significant reductions in model size without compromising quality. It accomplishes this through module-wise distillation, which means that separate components of the system, like the encoder and decoder, are distilled independently. This flexibility allows the project to significantly reduce the model size to just 5.23 million parameters—a decrease of up to 89.34% compared to the original model.

Despite being smaller, Nix-TTS still maintains an end-to-end, non-autoregressive approach. This means that it can produce speech without the need for a separate vocoder, which often adds complexity and reduces efficiency.

Performance and Efficiency

One of the standout features of Nix-TTS is its impressive speed. On a modern Intel-i7 CPU, it runs over three times faster than real time. Even on a low-power device like a Raspberry Pi 3B, it achieves over eight times the speed of real-time processing, making it highly efficient for a wide range of applications. This level of performance is accompanied by maintaining a good quality of voice naturalness and intelligibility, comparable to that of traditional, larger TTS models.

Getting Started with Nix-TTS

For users interested in exploring Nix-TTS, the process to get started is straightforward. Users can clone the repository from GitHub, install the necessary Python dependencies, and quickly begin experimenting with the pre-trained models. The implementation is accessible, allowing for a hands-on experience right from installation to speech generation. The repository also provides an interactive demo and audio samples to showcase its capabilities.

Installation Steps

Clone the Repository

git clone https://github.com/rendchevi/nix-tts.git
cd nix-tts

Install Python Dependencies
```
pip install -r requirements.txt
```
Install Additional Tools
- espeak is required for text tokenization:
```
sudo apt-get install espeak
```
- Alternative installation instructions are available if needed.
Download Pre-Trained Models
- Users can choose and download available pre-trained models to start using Nix-TTS.

Acknowledgments

The development of Nix-TTS is exclusively funded by Kata.ai, with the research team being an integral part of the Kata.ai Research Team. The project also borrows some methodology and implementations from existing open-source projects like VITS and Comprehensive-Transformer-TTS.

The Nix-TTS project stands as a promising advancement in the field of text-to-speech, offering a new paradigm that prioritizes efficiency and quality.