Introduction to the klaam Project
The klaam project is an innovative initiative aimed at providing advanced solutions for Arabic speech technology needs. It combines cutting-edge models to deliver capabilities such as speech recognition, classification, and text-to-speech. The project leverages the power of models like wave2vec and fastspeech2 to train and perform predictions using pretrained models.
Usage
Speech Classification
The klaam project provides a user-friendly way to classify speech with just a few lines of code. By importing the SpeechClassification
module, users can quickly classify Arabic audio files.
from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)
Speech Recognition
For transcribing Arabic speech into text, klaam offers a robust speech recognition module. Users simply need to import and initiate the SpeechRecognition
class to start transcribing audio files.
from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)
Text to Speech
The text-to-speech functionality in klaam allows for the synthesis of natural-sounding Arabic speech from text. Setting it up involves specifying paths to configuration files and pretrained models, after which users can easily synthesize text into speech.
from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"
model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path)
model.synthesize(sample_text)
Language and Dialect Support
Klaam accommodates the nuances of Arabic by supporting both Modern Standard Arabic (MSA) and the Egyptian dialect (EGY). Users can select their preferred dialect by setting the lang
attribute.
from klaam import SpeechRecognition
model = SpeechRecognition(lang='msa')
model.transcribe('file.wav')
Datasets
To ensure high accuracy and effectiveness, klaam utilizes a variety of datasets that include:
- MGB-3: Features Egyptian Arabic speech recognition data aggregated from sources like YouTube.
- ADI-5: Incorporates speech from Aljazeera TV, covering multiple regional dialects and MSA.
- Common Voice: A multilingual dataset available for broader speech recognition tasks.
- Arabic Speech Corpus: An extensive Arabic dataset with transcription and alignment details.
Models
Klaam supports several powerful models for different tasks:
- Egyptian and Standard Arabic for speech recognition using
wav2vec2
. - Dialect Classification across Egyptian, Levantine, Gulf, and North African dialects.
- Text-to-Speech with fastspeech2 for converting text to natural Arabic speech.
Example Notebooks
Klaam offers hands-on experience through example notebooks, which demonstrate classification, recognition, and text-to-speech tasks. These resources are accessible through platforms like Google Colab to facilitate easy experimentation.
Training
The klaam project includes comprehensive scripts for training models:
- Classification and Recognition: Uses pre-training methods on diverse datasets, including Egyptian dialects and the Arabic common voice.
- Text-to-Speech: Employs the FastSpeech2 framework, guiding users through data preparation to model training and deployment.
Contributions and Support
The klaam project was developed by the ARBML team and welcomes contributions. The community is encouraged to provide suggestions and enhancements by submitting pull requests.
In sum, klaam is a versatile and powerful tool tailored for Arabic language processing tasks, empowering developers and researchers with its advanced speech and text solutions.