speech-dataset-generator
The tool facilitates the creation of multilingual datasets for training text-to-speech and speech recognition models by transcribing and refining audio quality. It segments audio, identifies speaker gender, and utilizes pyannote embeddings for automatic speaker naming. Suitable for detecting multiple speakers, it enhances audio using deepfilternet, resembleai, or mayavoz. The tool supports input from local files, YouTube, LibriVox, and TED Talks, storing data efficiently in a Chroma database.