Introduction to TalkingGaussian Project
Overview
TalkingGaussian is a cutting-edge project focused on the synthesis of 3D talking heads using a technique called Gaussian Splatting. This innovative approach allows for creating animated 3D representations of talking heads that maintain structural consistency and realistic movement. The project results from collaborative research and is documented in a paper, which can be accessed here.
Installation and Setup
To get started with TalkingGaussian, users need an Ubuntu 18.04 environment with CUDA 11.3 and PyTorch 1.12.1. Users can clone the repository and set up the necessary environment using Conda and some additional Python packages like PyTorch3D and TensorFlow. If users encounter any issues related to certain components like 'diff-gaussian-rasterization' or 'gridencoder', guidance is provided via linked resources.
Preparation Steps
Before delving into usage, there are several preparatory steps:
- Face Parsing and 3D Model Preparation: Users must prepare models for face parsing and 3D Morphable Models (3DMM). A script is available to aid in these preparation tasks.
- Environment for EasyPortrait: Preparing the environment includes setting up
mmcv
and downloading required model weights for EasyPortrait, which assists in generating certain visual features.
Usage Guidelines
TalkingGaussian is offered for research purposes, urging users to apply it ethically and legally. It is important to respect copyright laws when using video datasets, making sure to attribute original creators correctly.
Working with Video and Audio
- Video: The project requires a 25 FPS video of the talking individual, maintaining specific video resolution and duration parameters. Videos are processed using a designated Python script to prepare them for training.
- Audio: For voice features, DeepSpeech and HuBERT are utilized to extract and process audio features, which are crucial for evaluation and synthesis.
Training and Testing
Training the model involves running specific bash scripts that can partially parallelize tasks to enhance speed efficiency. Upon completing training, users can test the synthesized 3D talking heads with provided scripts.
Inference with Target Audio
Inference allows users to implement target audio on synthesized models, using preprocessed audio files for integration.
Acknowledgements
The development of TalkingGaussian is based on existing projects like gaussian-splatting and improved by other resources like RAD-NeRF and DFRF. Recognition is given to these projects for their foundational contributions.
This project is supported by an active online community, with resources available for troubleshooting and peer collaboration through various platforms.
Citation
For acknowledging this project in academic or professional endeavors, users are encouraged to cite the work using the provided citation format, appreciating the effort and research of the authors involved in this innovative venture.