Project Icon

vall-e

Zero-Shot Text-to-Speech Synthesis with Neural Codec Models

Product DescriptionThis unofficial PyTorch implementation of the VALL-E model uses Neural Codec Language Models for zero-shot text-to-speech synthesis. It supports training on a single GPU, making it accessible for development. Safeguards are implemented to prevent misuse due to its ability to replicate speaker identity. Detailed guides cover installation requirements and training for English and Chinese datasets. The project includes advanced features like NAR Decoder prefix modes for refined synthesis outputs, providing valuable resources for researchers and developers in text-to-speech technology.
Project Details