sound_dataset_tools2 - Improve Voice Dataset Creation with Automated Optimization Using Intuitive Features

Introduction to sound_dataset_tools2

Overview

The sound_dataset_tools2 is a versatile tool designed to streamline the creation of speech datasets. It allows users to quickly produce the training datasets necessary for projects like VITS with just a single click.

Important Note: You are currently viewing the r1.0 branch of the project, which will cease to receive new feature updates and will be maintained only for archival purposes. Future developments will focus primarily on the r2.0 branch following a project restructuring.

Key Features

GUI Interface: The tool features a graphical user interface for ease of use.
Chinese Documentation: Available to cater to Chinese-speaking users.
Dual Import Methods: Supports both audio with subtitles and pure audio automatic segmentation (with plans for more methods in the future).
Optimized Audio Segmentation: Reduces the likelihood of audio interruptions.
Direct Export in Required Format: Datasets can be exported to formats compatible with VITS and other projects, with adjustable channel and sampling rates.
Voice Evaluation Capability: Allows for quick sorting of high-quality datasets from large volumes of data by scoring the data.

Software Architecture

Database: Utilizes SQLite and Peewee.
Interface: Built with PySide6.
Audio Processing: Employs tools like FFMPEG and pydub.

Installation Guide

Running the Compiled EXE file

Visit the GitHub Releases or Gitee Releases page.
Follow the instructions to download the appropriate zip file and double-click to launch the application.

Running from Source Code

Clone the Project:

From Gitee:

git clone https://gitee.com/kslizi/sound_dataset_tools2.git

From GitHub:

git clone https://github.com/kslz/sound_dataset_tools2.git

Install FFMPEG:
- Options include configuring an environment variable or placing FFMPEG directly in the lib directory after decompressing.

Install Additional Libraries:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Usage Instructions

Running the Project

Execute the command:

python main.py

Selecting a Workspace

Assign an appropriate directory for file imports, database storage, and generated dataset output based on available space.

Dataset Management Interface

From here, you can add, edit, or delete datasets, each operating independently. Enter your selected dataset by clicking the specified button.

Dataset Overview Interface

Perform various operations, including data import, export, and processing.

Data Import

From File (Audio + Subtitle): Choose processed audio and subtitle files. Senders are required for synchronization.
From File (Long Audio): Choose audio files and input the suitable sender, minimum silence length, and silence threshold for segmentation.

Data Deletion

Delete specific datasets by choosing the associated audio file.

Data Export

Single Speaker Export: Customize parameters or apply presets for a smooth exporting process.

Voice Evaluation

Utilize integrated voice evaluation services to efficiently select high-quality speech data. Integration with Biaobei Evaluation, for example, offers automated scoring options for faster dataset refinement.

Future Development Plans

Implementation of ASR annotation
Voice recognition feature development
Multi-speaker dataset export capability

FAQ

How to obtain subtitles? Utilize tools like Jianying or videoSRT for SRT subtitle extraction.
What is the optimization logic? Automatic adjustments aim to minimize interruptions by intelligently handling sound levels during segmentation.
How to upgrade the EXE version? Replace the previous executable file with the new one from the latest zip package.