UniCATS-CTX-vec2wav
CTX-vec2wav is a vocoder from the AAAI-2024 paper 'UniCATS: A Unified Context-Aware Text-to-Speech Framework,' offering an advanced approach to text-to-speech enhancement through contextual VQ-diffusion and vocoding. Compatible with Linux and optimized for Python 3.9, this project provides clear guidance for both inference and training, suitable for various datasets and conditions. It supports high-fidelity output at 16kHz and 24kHz, utilizing resources such as ESPnet, Kaldi, and ParallelWaveGAN, and offers pre-trained models to advance speech synthesis development.