Introduction to sound_dataset_tools2
Overview
The sound_dataset_tools2 is a versatile tool designed to streamline the creation of speech datasets. It allows users to quickly produce the training datasets necessary for projects like VITS with just a single click.
Important Note: You are currently viewing the r1.0 branch of the project, which will cease to receive new feature updates and will be maintained only for archival purposes. Future developments will focus primarily on the r2.0 branch following a project restructuring.
Key Features
- GUI Interface: The tool features a graphical user interface for ease of use.
- Chinese Documentation: Available to cater to Chinese-speaking users.
- Dual Import Methods: Supports both audio with subtitles and pure audio automatic segmentation (with plans for more methods in the future).
- Optimized Audio Segmentation: Reduces the likelihood of audio interruptions.
- Direct Export in Required Format: Datasets can be exported to formats compatible with VITS and other projects, with adjustable channel and sampling rates.
- Voice Evaluation Capability: Allows for quick sorting of high-quality datasets from large volumes of data by scoring the data.
Software Architecture
- Database: Utilizes SQLite and Peewee.
- Interface: Built with PySide6.
- Audio Processing: Employs tools like FFMPEG and pydub.
Installation Guide
Running the Compiled EXE file
- Visit the GitHub Releases or Gitee Releases page.
- Follow the instructions to download the appropriate zip file and double-click to launch the application.
Running from Source Code
-
Clone the Project:
- From Gitee:
git clone https://gitee.com/kslizi/sound_dataset_tools2.git
- From GitHub:
git clone https://github.com/kslz/sound_dataset_tools2.git
- From Gitee:
-
Install FFMPEG:
- Options include configuring an environment variable or placing FFMPEG directly in the
lib
directory after decompressing.
- Options include configuring an environment variable or placing FFMPEG directly in the
-
Install Additional Libraries:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
Usage Instructions
Running the Project
Execute the command:
python main.py
Selecting a Workspace
Assign an appropriate directory for file imports, database storage, and generated dataset output based on available space.
Dataset Management Interface
From here, you can add, edit, or delete datasets, each operating independently. Enter your selected dataset by clicking the specified button.
Dataset Overview Interface
Perform various operations, including data import, export, and processing.
Data Import
-
From File (Audio + Subtitle): Choose processed audio and subtitle files. Senders are required for synchronization.
-
From File (Long Audio): Choose audio files and input the suitable sender, minimum silence length, and silence threshold for segmentation.
Data Deletion
Delete specific datasets by choosing the associated audio file.
Data Export
- Single Speaker Export: Customize parameters or apply presets for a smooth exporting process.
Voice Evaluation
Utilize integrated voice evaluation services to efficiently select high-quality speech data. Integration with Biaobei Evaluation, for example, offers automated scoring options for faster dataset refinement.
Future Development Plans
- Implementation of ASR annotation
- Voice recognition feature development
- Multi-speaker dataset export capability
FAQ
-
How to obtain subtitles? Utilize tools like Jianying or videoSRT for SRT subtitle extraction.
-
What is the optimization logic? Automatic adjustments aim to minimize interruptions by intelligently handling sound levels during segmentation.
-
How to upgrade the EXE version? Replace the previous executable file with the new one from the latest zip package.