Whisper-TikTok - AI-powered TikTok Video Creation Utilizing Whisper and Edge TTS

Whisper-TikTok: Bringing AI to Video Creation

Introduction

Whisper-TikTok is an innovative AI-driven tool that combines the capabilities of several advanced technologies to create engaging and professional TikTok videos. At its core are OpenAI's Whisper model, Microsoft's Edge Text-to-Speech (TTS) API, and FFMPEG. This tool is designed to make the process of video creation easy and effective by accurately transcribing audio files and generating video content with natural-sounding voiceovers.

How It Works

The functionality of Whisper-TikTok is straightforward and designed for user convenience. It begins with a structured JSON file named video.json, which includes necessary information like the series name, part number, text to be spoken, and tags. After reading this data, the program goes through several key steps:

Environment setup through .env files.
Checking for PyTorch with CUDA for optimal performance; if not available, the program will run using the CPU.
Downloading a random video, such as a Minecraft gameplay, from platforms like YouTube.
Using OpenAI Whisper to load audio into memory and generate text-to-speech audio files via the Microsoft Edge Cloud TTS API.
Generating a transcription in SRT format using OpenAI Whisper.
Picking a random background video from a dedicated folder.
Merging the transcription and chosen video using FFMPEG to create the final TikTok video.
Uploading the video to TikTok through the user's account by utilizing a cookies.txt file.

This entire process results in a completed video, ready for upload in a short amount of time.

Getting Started

Whisper-TikTok is accessible online through a Streamlit web app hosted on HuggingFace. For those preferring local installation, the software is tested on systems running Windows 10, 11, and Ubuntu with Python versions 3.8, 3.9, and 3.11.

To begin, users can clone the repository using Git. For installation of dependencies, a requirements file is provided to streamline the process. An additional requirement is FFMPEG, a command-line utility for processing video and audio files, which can be installed via various package managers according to the user's operating system.

Usage

The tool can be run via a local web interface or the command line. The command line offers various options allowing users to select models, choose specific TTS voices, and customize subtitle appearance. There are also flags for uploading videos directly to TikTok or randomly selecting TTS voices with specified gender and language.

Examples demonstrate how to create videos using specific parameters, making the tool versatile for differing user needs.

Additional Resources

For enhanced usability, a script is available that can transform Reddit links into JSON files suitable for video creation. This script, named reddit2json, offers additional features like translation and content modification via OpenAI GPT API calls.

Community and Support

Whisper-TikTok encourages community contributions and provides guidelines for those interested in contributing. Upcoming features include more advanced response generation through the OpenAI API and the capability to generate content from Reddit.

Acknowledgments

The project extends thanks to other developers and tools, such as edge-tts and the OpenAI Whisper model, for their contributions to the success of Whisper-TikTok.

Under an Apache License, Whisper-TikTok not only provides a powerful video creation solution but also fosters a collaborative development environment.

For those interested, a detailed README and code of conduct are available to ensure a positive experience while using or contributing to Whisper-TikTok.