hifi-gan
HiFi-GAN employs GAN technology to efficiently produce 22.05 kHz high-quality speech, running at 167.9 times the real-time speed using a single V100 GPU. It enhances audio quality by modeling periodic patterns and supports both mel-spectrogram inversion and end-to-end speech synthesis. The CPU-efficient version achieves 13.4 times real-time speed with quality comparable to autoregressive models. Open-source tools and pre-trained models offer flexibility in application.