MARS5-TTS
Discover MARS5, a novel model using two-stage AR-NAR architecture for generating diverse audio from brief reference inputs. Designed for challenging tasks like sports commentary and anime, MARS5 offers intuitive control over speech prosody through text formatting. Its architecture combines autoregressive and multinomial DDPM methods, ensuring consistent and high-quality results. Access detailed documentation to maximize its application across different languages.