TalkingGaussian - Focus on Consistent 3D Talking Head Synthesis Using Gaussian Splatting

Introduction to TalkingGaussian Project

Overview

TalkingGaussian is a cutting-edge project focused on the synthesis of 3D talking heads using a technique called Gaussian Splatting. This innovative approach allows for creating animated 3D representations of talking heads that maintain structural consistency and realistic movement. The project results from collaborative research and is documented in a paper, which can be accessed here.

Installation and Setup

To get started with TalkingGaussian, users need an Ubuntu 18.04 environment with CUDA 11.3 and PyTorch 1.12.1. Users can clone the repository and set up the necessary environment using Conda and some additional Python packages like PyTorch3D and TensorFlow. If users encounter any issues related to certain components like 'diff-gaussian-rasterization' or 'gridencoder', guidance is provided via linked resources.

Preparation Steps

Before delving into usage, there are several preparatory steps:

Face Parsing and 3D Model Preparation: Users must prepare models for face parsing and 3D Morphable Models (3DMM). A script is available to aid in these preparation tasks.
Environment for EasyPortrait: Preparing the environment includes setting up mmcv and downloading required model weights for EasyPortrait, which assists in generating certain visual features.

Usage Guidelines

TalkingGaussian is offered for research purposes, urging users to apply it ethically and legally. It is important to respect copyright laws when using video datasets, making sure to attribute original creators correctly.

Working with Video and Audio

Video: The project requires a 25 FPS video of the talking individual, maintaining specific video resolution and duration parameters. Videos are processed using a designated Python script to prepare them for training.
Audio: For voice features, DeepSpeech and HuBERT are utilized to extract and process audio features, which are crucial for evaluation and synthesis.

Training and Testing

Training the model involves running specific bash scripts that can partially parallelize tasks to enhance speed efficiency. Upon completing training, users can test the synthesized 3D talking heads with provided scripts.

Inference with Target Audio

Inference allows users to implement target audio on synthesized models, using preprocessed audio files for integration.

Acknowledgements

The development of TalkingGaussian is based on existing projects like gaussian-splatting and improved by other resources like RAD-NeRF and DFRF. Recognition is given to these projects for their foundational contributions.

This project is supported by an active online community, with resources available for troubleshooting and peer collaboration through various platforms.

Citation

For acknowledging this project in academic or professional endeavors, users are encouraged to cite the work using the provided citation format, appreciating the effort and research of the authors involved in this innovative venture.