KAN-TTS - Create Custom Text-to-Speech Models with Multi-Language Support

Project Introduction: KAN-TTS

KAN-TTS is a powerful tool designed to help users train their own Text-to-Speech (TTS) models from scratch. Its aim is to simplify the process of developing customized TTS solutions, empowering users to create high-quality audio outputs from text input.

Supported Models

KAN-TTS currently supports two key models: sam-bert and hifi-GAN. These models are integral to producing clear and natural-sounding speech. The platform is actively developing and will be introducing more models in the near future.

Supported Languages

KAN-TTS offers support for a diverse range of languages, ensuring users can work with models that cater to various linguistic preferences. Here are the supported languages along with links to their respective models on ModelScope:

Mandarin: Model Link
English: Model Link
British English: Model Link
Shanghainese: Model Link
Sichuanese: Model Link
Cantonese: Model Link
Italian: Model Link
Spanish: Model Link
Russian: Model Link
Korean: Model Link

The platform continues to expand its language offerings, with more languages expected to be added soon.

Training Tutorial

For users interested in learning how to train their own TTS models, KAN-TTS provides a comprehensive training tutorial available on their Wiki page. This tutorial guides users through the process, ensuring they have the necessary tools and knowledge to efficiently train their models.

ModelScope Demo

To experience the capabilities of KAN-TTS, users are encouraged to try out the demo available on ModelScope. This demo showcases the platform's TTS capabilities and can be accessed here.

Contribution and Contact

For those interested in contributing to the KAN-TTS project, the setup process involves installing dependencies using:

pip install -r requirements.txt
pre-commit install

The project welcomes contributions and is open to community engagement. For questions or further communication, KAN-TTS offers an option to join their DingTalk group by scanning a QR code provided in their documentation.

By bringing together a robust model support structure and multilingual capabilities, KAN-TTS stands as a versatile and expanding tool for anyone looking to explore the world of text-to-speech technology.