GPS-Gaussian: A Revolution in Real-time Human Novel View Synthesis
Overview
GPS-Gaussian is an innovative approach for creating new viewpoints of human figures in real-time, utilizing a pixel-wise 3D Gaussian representation. Developed by a team of researchers from Harbin Institute of Technology, Tsinghua University, and Peng Cheng Laboratory, this project allows for the rapid generation of novel views of previously unseen characters without the need for additional fine-tuning or optimization.
Installation and Setup
To get started with GPS-Gaussian, users need to create a dedicated environment using Conda. After setting up the environment, they should compile the 3D Gaussian Splatting component, which is essential for the project's operation. Optionally, users can enhance model performance with a faster implementation from RAFT-Stereo. Details for these steps include:
-
Create and activate the environment:
conda env create --file environment.yml conda activate gps_gaussian
-
Compile the Gaussian splatting component:
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive cd gaussian-splatting/ pip install -e submodules/diff-gaussian-rasterization cd ..
-
(Optional) Install the faster CUDA implementation:
git clone https://github.com/princeton-vl/RAFT-Stereo.git cd RAFT-Stereo/sampler && python setup.py install && cd ../..
Run on Synthetic Human Dataset
Dataset Preparation
For effective training, GPS-Gaussian requires a prepared dataset. The THuman2.0 dataset, available for download, is specifically recommended. This dataset needs approximately 50GB of space and includes various human scans necessary for training. Users are encouraged to expand their datasets with additional scans from sources such as Twindom or Render People, ensuring they match the camera configurations used in GPS-Gaussian training scenarios.
Training Process
The training consists of two main stages:
-
Stage 1: Pretraining the depth prediction model by setting the appropriate data path in the configuration file and running the training script.
python train_stage1.py
-
Stage 2: Training the full model using the preprocessed datasets and the pre-trained depth model from Stage 1.
python train_stage2.py
Pre-trained models are also available for users who prefer to skip these steps and proceed directly to testing.
Testing
For real-world application, test data can be downloaded and processed to synthesize new views between two source views or even perform freeview rendering for a more comprehensive visualization. Scripts are provided to facilitate these processes, allowing adjustments to the novel viewpoint positions and the number of novel views generated.
Usage in Research
GPS-Gaussian represents a significant leap in the field of computer vision, particularly in generating new perspectives of human figures with speed and accuracy. Researchers utilizing GPS-Gaussian can cite the project in their work, as it offers valuable resources and advancements in real-time novel view synthesis.
GPS-Gaussian is a testament to the power of collaboration and technological advancement, providing a cutting-edge tool for researchers and developers in computer vision and graphics.