TurnVoice - Enhance Video Content with Seamless Voice Transformation and Translation Using TTS Technology

Introduction to TurnVoice

TurnVoice is an innovative command-line tool designed for transforming voices in video content, especially popular on platforms like YouTube. Besides its core voice transformation capability, TurnVoice also offers translation features, making it a unique tool for enhancing audio content.

Features of TurnVoice

Voice Transformation: Utilizes the free Coqui TTS to transform voices without incurring operational costs. It supports voice cloning and includes 58 voices to choose from, allowing users to customize the audio experience of their videos.
Voice Variety: Offers compatibility with several popular text-to-speech (TTS) engines like Elevenlabs, OpenAI TTS, and Azure, expanding the range of voices available for transformation.
Translation Capabilities: Enables translation of video content at no cost, powered by a free deep-translator tool. This feature can be especially useful for making content accessible to non-native language speakers.
Custom Speaking Styles: Enhanced by AI, this feature allows users to deliver spoken content in various custom speaking styles by using prompts, adding a creative touch to audio presentations.
Controlled Rendering: Provides precise control over how the audio is rendered by allowing customization of sentence text, timing, and voice selection. The included Renderscript Editor supports users in refining these aspects.
Local Video Processing: Capable of processing local video files, TurnVoice ensures flexibility and control over video content editing.
Background Audio Preservation: Maintains the original background audio of videos, ensuring that the essence of the original content is not lost during voice transformation.

Prerequisites

To use TurnVoice effectively, having a compatible setup is essential. A system with an Nvidia graphics card featuring over 8 GB of VRAM is recommended. It has been tested thoroughly on Python version 3.11.4 and Windows 10.

Essential installations include:

NVIDIA CUDA Toolkit and cuDNN for GPU optimization.
Command-line utilities like Rubberband and ffmpeg for processing audio and video files.
Configuring Huggingface environment for features requiring speaker diarization and segmentation.

Installation

TurnVoice can be installed using Pip, the Python package manager. For optimal performance, it is recommended to set up a CUDA environment post-installation. Users should ensure that they are using compatible versions of dependencies for a seamless experience.

Usage

TurnVoice is flexible in its usage. Users can input either a YouTube URL, ID, or a local video file into TurnVoice. By specifying translation languages, TTS engines, and output formats, users can precisely control how the video voices are transformed and rendered.

Here's an example command using Arthur Morgan's voice as an overlay for a cooking tutorial:

turnvoice -i AmC9SmCBUj4 -v arthur.wav -o cooking_with_arthur.mp4

This requires the specific voice file to be available in the working directory.

Workflow

The TurnVoice workflow is segmented into preparation, editing, and rendering stages:

Preparation: Users can prepare a script through transcription and potentially translation using the --prepare command. This script can be modified for accuracy or style.
Renderscript Editor: An HTML editor helps visualize, edit, and verify the script generated in the preparation stage. Users can adjust text, timings, and speaker assignments.
Rendering: The final step uses the refined script to render the video with new voice tracks.

Additional Features

Translate video content into various languages, utilize multiple engines, and control specific parameters such as speakers and languages through a vast array of command options.

Supported Engines

TurnVoice supports several TTS engines like Coqui, Elevenlabs, OpenAI, and Azure. Each has its setup requirements and parameters for defining voices and translation options. Users should ensure that API keys and environment variables are correctly set for smooth operation.

In summary, TurnVoice is a powerful tool for transforming and translating video voice content, providing users with extensive control and flexibility while maintaining the original content's essence. Whether you are an individual seeking to personalize your video projects or a content creator aiming to reach diverse audiences, TurnVoice offers robust solutions to meet your needs.