Project Icon

SoundStorm

Efficient Audio Generation Using Mask-Based Discrete Diffusion in Parallel

Product DescriptionDelve into the unofficial Pytorch version of Google's SoundStorm, featuring mask-based discrete diffusion for audio generation. This approach utilizes HuBERT for semantic tokens and uses a shallow u-net to integrate different codebooks in parallel audio processing. For detailed insights, refer to the InsturctTTS paper. Upcoming updates will incorporate MASKGIT to align with Google's methodology, serving developers and researchers focused on advanced audio processing.
Project Details