en

#zero-shot voice conversion

naturalspeech3_facodec

FACodec is an integral part of NaturalSpeech 3, transforming speech synthesis by efficiently converting speech waveforms into separate subspaces including content, prosody, timbre, and acoustic details. By using attribute factorization, it aids in the precise modeling and reconstruction of speech waveforms. FACodec enables the creation of both non-autoregressive and autoregressive TTS models, supporting zero-shot voice conversion. It is suitable for 16KHz audio and generates multiple speech codes, enhancing projects like VALL-E and contributing significantly to advancements in TTS research.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]