Diff-HierVC
Diff-HierVC is an advanced voice conversion system utilizing diffusion models to enhance pitch accuracy and speaker adaptation. Featuring DiffPitch and DiffVoice components, it achieves precise F0 generation and effective voice style transfer. The system incorporates a source-filter encoder and a data-driven Mel-spectrogram prior to boost conversion quality. In zero-shot adaptation scenarios, it delivers a 0.83% CER and 3.29% EER, offering versatile solutions for voice conversion challenges across diverse datasets.