xtts-webui - Efficient Web Interface for XTTS-Based Voice Synthesis

XTTS-WebUI: Simplifying Text-to-Speech Interactions

Portable Version

The XTTS-WebUI project has introduced a portable version, streamlining user accessibility. No longer bound by the hassle of installing numerous dependencies, users can easily download the portable package. This package is designed for users with Windows systems and an Nvidia graphics card featuring at least 6 GB of video memory. To get started, simply click here to download.

Purpose and Capabilities

XTTS-WebUI offers a comprehensive web interface to exploit XTTS's full potential, supported by a suite of neural networks. This combination enhances voice model outcomes. Users can conduct model fine-tuning to produce high-quality voice models. The interface is enriched with features such as batch processing of dubbing tasks, audio translation with voice retention, and automatic neural network enhancements.

Key Features

Optimized XTTSv2 Experience: Seamless integration with XTTS version 2, providing a user-friendly experience.
Batch Audio Processing: Efficient handling of multiple audio files, ideal for large-scale dubbing projects.
Voice Preserving Translation: Translate audio while maintaining original voice characteristics.
Enhancements via Neural Networks: Automated audio improvement using state-of-the-art neural technologies.
Fine-Tuning Capability: Users can fine-tune models and apply changes instantly.
Customizable Tools: Use advanced tools like RVC, OpenVoice, and Resemble Enhance, both individually and collectively.
Flexible Speech Generation: Customize speech output with multiple settings and voice samples.

Future Plans

The project roadmap aims to enhance usability and feature set:

Completed: Status bar with error information and integration of training in the interface.
Upcoming: Streaming capabilities, novel text processing for voiceover, customizable speaker settings for batch processing, and API integration.

Installation and Setup

Method 1: Using Scripts

For Windows:

Execute ‘install.bat’ to install dependencies.
Use ‘start_xtts_webui.bat’ to launch the application.
Access the UI through the browser using the local address provided.

For Linux:

Execute ‘install.sh’ for setup.
Launch with ‘start_xtts_webui.sh’ and access via browser.

Method 2: Manual Installation

Ensure CUDA is installed.
Clone the repository with: git clone https://github.com/daswer123/xtts-webui
Navigate to the directory and set up a virtual environment.
Activate the virtual environment and install necessary libraries using pip.

Running the Application

Activate the environment and start XTTS-WebUI by executing:

python app.py

The application supports various runtime arguments for customization, such as setting the host, port, device type (CPU or CUDA), speaker folder, language, and more.

RVC Module Integration

Incorporate RVC for enhanced audio post-processing by using the --rvc flag during execution. Upload necessary model files to the designated directory.

Differences from Official WebUI

XTTS-WebUI introduces several improvements:

Enhanced data processing with faster whisper updates, customizable datasets, and language specification.
Fine-tuning flexibility with custom model selection and optimization capabilities.
Inference options allow customized settings during model checks.
Additional features like Japanese language support and streamlined data handling enhance overall usability.

XTTS-WebUI embodies a forward-thinking approach to text-to-speech technology, providing professional-grade tools and a user-centric interface for both developers and content creators.