voicesmith - Multispeaker Text-to-Speech Training without Coding Experience

Introducing VoiceSmith

VoiceSmith is a burgeoning project that redefines the way individuals can engage with text-to-speech technology. Its standout feature is accessibility, allowing users to train and infer both single and multispeaker models without the need for coding skills. This opens up text-to-speech capabilities to a broader audience, including those who aren't technically inclined.

Technology Behind VoiceSmith

VoiceSmith operates on a robust text-to-speech pipeline, fine-tuned from a modified version of DelightfulTTS and UnivNet. These models are honed on a proprietary dataset featuring 5000 speakers. This advanced tuning process enhances the quality and emotional range of speech that VoiceSmith can produce. Additionally, the project includes tools for dataset preprocessing, such as automatic text normalization, streamlining the preparation process for users.

For those curious about the results of this technology, a demonstration model trained on a highly emotional 60 speaker dataset is available for exploration via an earlier version of the software.

System Requirements

Hardware Requirements:

Operating System: Currently, only Windows supports CPU operations. Linux-based systems are preferred for additional flexibility. MacOS users would need to build from source to create an installer, though this setup hasn't been tested.
Graphics: An NVIDIA GPU with CUDA support is highly recommended for training, as relying on a CPU may prolong the process significantly.
RAM: A minimum of 8GB RAM is recommended for optimal performance, with the possibility of issues arising with less memory.

Software Requirements:

Docker: It's a key component for running the environment. Windows and Linux users should download it, while Linux users are advised to install Docker Engine for a smoother experience.

Installation Guide

Visit the releases page to download the latest installer.
Run the installer by double-clicking it to start the installation.

Development Process

For those interested in developing with VoiceSmith:

Ensure the latest version of Node.js is installed.

Clone the VoiceSmith repository with the command:

git clone https://github.com/dunky11/voicesmith

Navigate to the repository directory and install dependencies:
```
cd voicesmith
npm install
```
Download the necessary assets and place them in the repository's assets folder.
Start the project by executing:
```
npm start
```

Building from Source

If you intend to build VoiceSmith from scratch:

Complete steps 1 through 4 of the development process.
Execute the build process using the command:
```
npm make
```
The build will produce a system-specific installer located in the out/make directory.

Architectural Design

VoiceSmith employs a dual-stage architecture combining modified DelightfulTTS and UnivNet elements, optimizing it for diverse and high-quality speech synthesis.

How to Contribute

Community involvement is highly encouraged. Support the project by starring it on GitHub and consider contributing via pull requests.

Licensing

VoiceSmith is released under the Apache-2.0 license. For more information, refer to the LICENSE.md file.

VoiceSmith is an exciting opportunity for anyone interested in leveraging cutting-edge text-to-speech technology, whether they're new to the field or experienced developers.