Project Icon

speech-dataset-generator

Enhance Multilingual Dataset Creation for Speech Models with Advanced Audio Processing

Product DescriptionThe tool facilitates the creation of multilingual datasets for training text-to-speech and speech recognition models by transcribing and refining audio quality. It segments audio, identifies speaker gender, and utilizes pyannote embeddings for automatic speaker naming. Suitable for detecting multiple speakers, it enhances audio using deepfilternet, resembleai, or mayavoz. The tool supports input from local files, YouTube, LibriVox, and TED Talks, storing data efficiently in a Chroma database.
Project Details