#zero-shot

Logo of FateZero
FateZero
This zero-shot video editing framework uses pre-trained diffusion models for text-based modifications, preserving video structure and motion by using intermediate attention maps. It enhances consistency with spatial-temporal attention, offering style and attribute changes in videos. The method allows for shape-aware adjustments, as demonstrated through empirical evaluations.
Logo of Diff-HierVC
Diff-HierVC
Diff-HierVC is an advanced voice conversion system utilizing diffusion models to enhance pitch accuracy and speaker adaptation. Featuring DiffPitch and DiffVoice components, it achieves precise F0 generation and effective voice style transfer. The system incorporates a source-filter encoder and a data-driven Mel-spectrogram prior to boost conversion quality. In zero-shot adaptation scenarios, it delivers a 0.83% CER and 3.29% EER, offering versatile solutions for voice conversion challenges across diverse datasets.