fish-diffusion
Fish Diffusion leverages a diffusion model for versatile voice generation, including TTS, SVS, and SVC. It simplifies the process with support for multi-speaker datasets and an intuitive code structure. Key features include multi-device and half-precision training, enhancing speed and efficiency. The framework supports the 44.1kHz Diff Singer vocoder and requires dependencies like PyTorch, installable via conda and PDM. Detailed guides facilitate vocoder setup, dataset preparation, and baseline training, with conversion and inference options available through shell scripts or Gradio Web. The project is continuously developed, welcoming community contributions.