klaam - Utilizing Wave2Vec and FastSpeech2 for Arabic Speech Processing

Introduction to the klaam Project

The klaam project is an innovative initiative aimed at providing advanced solutions for Arabic speech technology needs. It combines cutting-edge models to deliver capabilities such as speech recognition, classification, and text-to-speech. The project leverages the power of models like wave2vec and fastspeech2 to train and perform predictions using pretrained models.

Usage

Speech Classification

The klaam project provides a user-friendly way to classify speech with just a few lines of code. By importing the SpeechClassification module, users can quickly classify Arabic audio files.

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

Speech Recognition

For transcribing Arabic speech into text, klaam offers a robust speech recognition module. Users simply need to import and initiate the SpeechRecognition class to start transcribing audio files.

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

Text to Speech

The text-to-speech functionality in klaam allows for the synthesis of natural-sounding Arabic speech from text. Setting it up involves specifying paths to configuration files and pretrained models, after which users can easily synthesize text into speech.

from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path)
model.synthesize(sample_text)

Language and Dialect Support

Klaam accommodates the nuances of Arabic by supporting both Modern Standard Arabic (MSA) and the Egyptian dialect (EGY). Users can select their preferred dialect by setting the lang attribute.

from klaam import SpeechRecognition
model = SpeechRecognition(lang='msa')
model.transcribe('file.wav')

Datasets

To ensure high accuracy and effectiveness, klaam utilizes a variety of datasets that include:

MGB-3: Features Egyptian Arabic speech recognition data aggregated from sources like YouTube.
ADI-5: Incorporates speech from Aljazeera TV, covering multiple regional dialects and MSA.
Common Voice: A multilingual dataset available for broader speech recognition tasks.
Arabic Speech Corpus: An extensive Arabic dataset with transcription and alignment details.

Models

Klaam supports several powerful models for different tasks:

Egyptian and Standard Arabic for speech recognition using wav2vec2.
Dialect Classification across Egyptian, Levantine, Gulf, and North African dialects.
Text-to-Speech with fastspeech2 for converting text to natural Arabic speech.

Example Notebooks

Klaam offers hands-on experience through example notebooks, which demonstrate classification, recognition, and text-to-speech tasks. These resources are accessible through platforms like Google Colab to facilitate easy experimentation.

Training

The klaam project includes comprehensive scripts for training models:

Classification and Recognition: Uses pre-training methods on diverse datasets, including Egyptian dialects and the Arabic common voice.
Text-to-Speech: Employs the FastSpeech2 framework, guiding users through data preparation to model training and deployment.

Contributions and Support

The klaam project was developed by the ARBML team and welcomes contributions. The community is encouraged to provide suggestions and enhancements by submitting pull requests.

In sum, klaam is a versatile and powerful tool tailored for Arabic language processing tasks, empowering developers and researchers with its advanced speech and text solutions.