parrots - Multilingual Speech Recognition and Synthesis Toolkit

Parrots: A Comprehensive ASR and TTS Toolkit

Parrots is an innovative toolkit that seamlessly combines Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) functionalities. It supports multiple languages including Chinese, English, and Japanese, making it a versatile choice for developers working with speech and voice applications.

Features of Parrots

ASR Functionality: The toolkit offers a robust platform for speech recognition leveraging the distilwhisper model. This model is tailored for handling multiple languages, embracing the complexities of both Chinese and English, thereby providing accurate transcription services.

TTS Capabilities: On the speech synthesis front, Parrots utilizes the GPT-SoVITS model. This model is expertly designed to accommodate various languages, including Chinese, English, and Japanese, making it capable of producing natural and fluent speech outputs across a diverse linguistic landscape.

Installation Process

Setting up Parrots is straightforward. Users can install the required components directly using Python's package manager, pip:

pip install torch # or conda install pytorch
pip install -r requirements.txt
pip install parrots

Alternatively, the source code can be cloned from GitHub for a more manual installation:

pip install torch # or conda install pytorch
git clone https://github.com/shibing624/parrots.git
cd parrots
python setup.py install

Tutorials and Demos

To experience Parrots in action, users can access online demonstrations:

Additionally, for hands-on experiences, example scripts such as tts_gradio_demo.py are provided to showcase the toolkit's capabilities. By running these scripts, users can observe the seamless processes of converting text to speech or recognizing spoken input.

How to Use Parrots

ASR Usage Example: The toolkit’s speech recognition can be demonstrated through a simple Python script. By loading a speech file and processing it, Parrots efficiently outputs the transcribed text:

from parrots import SpeechRecognition

m = SpeechRecognition()
r = m.recognize_speech_from_file('path/to/audio.wav')
print('Recognition Result:', r)

TTS Usage Example: Similarly, Parrots can be employed to synthesize speech from text inputs. Here's a snippet illustrating how to use this functionality:

from parrots import TextToSpeech

m = TextToSpeech(speaker_model_path="model_path", speaker_name="SpeakerName")
m.predict(text="Hello, welcome to the city.", text_language="auto", output_path="output_audio.wav")

This script synthesizes speech and saves the output as a WAV file.

Command-Line Interface

Parrots also provides a command-line interface for users who prefer terminal commands over graphical interfaces. The following commands demonstrate basic usage for both ASR and TTS tasks:

parrots asr audio_file.wav
parrots tts "Hello, world" output_audio.wav

Available Models

Parrots includes various models tailored for different languages and voices. For instance, the BELLE-2/Belle-distilwhisper-large-v2-zh model is used for ASR, and diverse TTS speaker models are available, each with unique vocal characteristics and language proficiencies.

Contact and Contribution

Users encountering issues or interested in enhancing Parrots can reach out via GitHub issues or email the developer team. Contributions are encouraged to further refine the project, provided they are accompanied by appropriate unit tests and code verification.

Conclusion

Parrots is a state-of-the-art toolkit that addresses the growing need for efficient and accurate speech recognition and synthesis solutions. With its robust features, ease of use, and support for multiple languages, Parrots stands out as a valuable resource for developers and researchers alike.