FunAudioLLM-APP - Comprehensive Audio Interaction and Translation Technologies

Introduction to the FunAudioLLM-APP Project

Welcome to the FunAudioLLM-APP project! This fascinating initiative combines cutting-edge audio comprehension and speech generation technologies to enrich users' audio experiences. At its core, the project features two innovative applications designed to enhance communication through advanced AI capabilities.

Key Applications

Voice Chat: The Voice Chat application is designed to offer an interactive and natural conversation experience. By utilizing sophisticated AI models, it can facilitate meaningful dialogues in various scenarios, making it easier for users to engage in advanced chat interactions.

Voice Translation: With the Voice Translation application, language barriers become a thing of the past. This real-time tool translates spoken languages instantly, enabling seamless and efficient communication between individuals who speak different languages.

For more detailed information, you can explore the following resources:

Related Resources

For those interested in the underlying technologies, here are links to specific repositories:

CosyVoice: Explore the CosyVoice repo and its corresponding CosyVoice space.
SenseVoice: Check out the SenseVoice repo and the related SenseVoice space.

Installation Guide

To get started with the FunAudioLLM-APP project, follow these steps:

Clone and Install

Clone the repository and its submodules:
```
git clone --recursive URL
```
If there are network issues while cloning submodules, run the following commands until successful:
```
cd funaudiollm-app
git submodule update --init --recursive
```
Prepare the environments needed by the submodules as per instructions in the CosyVoice and SenseVoice repositories. Alternatively, if you have pre-existing setups, modify the resource path configuration in the app.py file (lines 15-18) accordingly.
Finally, execute the code below to install the required packages:
```
pip install -r requirements.txt
```

Basic Usage

Preparation

Obtain a Dashscope API token.
Acquire the necessary pem file.

Voice Chat

To run the Voice Chat application:

cd voice_chat
sudo CUDA_VISIBLE_DEVICES="0" DS_API_TOKEN="YOUR-DS-API-TOKEN" python app.py >> ./log.txt

Access the application via: https://YOUR-IP-ADDRESS:60001/

Voice Translation

To execute the Voice Translation application:

cd voice_translation
sudo CUDA_VISIBLE_DEVICES="0" DS_API_TOKEN="YOUR-DS-API-TOKEN" python app.py >> ./log.txt

Access it through: https://YOUR-IP-ADDRESS:60002/

Enjoy exploring the dynamic capabilities of FunAudioLLM-APP, where technology meets communication in novel ways!