StableTTS
StableTTS is a state-of-the-art flow-matching TTS model that integrates DiT, supporting efficient speech generation across Chinese, English, and Japanese. This 31M parameter model enhances audio quality and supports CFG and FireflyGAN vocoders, with improvements in the Chinese text frontend. The newly released version 1.1 introduces features like U-Net-inspired skip connections and a cosine timestep scheduler, all within a single multilingual checkpoint. Designed for user-friendly training, it simplifies data preparation and finetuning, making it an adaptable solution for varied audio generation applications.