naturalspeech3_facodec
FACodec is an integral part of NaturalSpeech 3, transforming speech synthesis by efficiently converting speech waveforms into separate subspaces including content, prosody, timbre, and acoustic details. By using attribute factorization, it aids in the precise modeling and reconstruction of speech waveforms. FACodec enables the creation of both non-autoregressive and autoregressive TTS models, supporting zero-shot voice conversion. It is suitable for 16KHz audio and generates multiple speech codes, enhancing projects like VALL-E and contributing significantly to advancements in TTS research.