GPS-Gaussian - Real-time Pixel-wise 3D Gaussian Splatting for Instant Novel View Rendering

GPS-Gaussian: A Revolution in Real-time Human Novel View Synthesis

Overview

GPS-Gaussian is an innovative approach for creating new viewpoints of human figures in real-time, utilizing a pixel-wise 3D Gaussian representation. Developed by a team of researchers from Harbin Institute of Technology, Tsinghua University, and Peng Cheng Laboratory, this project allows for the rapid generation of novel views of previously unseen characters without the need for additional fine-tuning or optimization.

Installation and Setup

To get started with GPS-Gaussian, users need to create a dedicated environment using Conda. After setting up the environment, they should compile the 3D Gaussian Splatting component, which is essential for the project's operation. Optionally, users can enhance model performance with a faster implementation from RAFT-Stereo. Details for these steps include:

Create and activate the environment:

conda env create --file environment.yml
conda activate gps_gaussian

Compile the Gaussian splatting component:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting/
pip install -e submodules/diff-gaussian-rasterization
cd ..

(Optional) Install the faster CUDA implementation:

git clone https://github.com/princeton-vl/RAFT-Stereo.git
cd RAFT-Stereo/sampler && python setup.py install && cd ../..

Run on Synthetic Human Dataset

Dataset Preparation

For effective training, GPS-Gaussian requires a prepared dataset. The THuman2.0 dataset, available for download, is specifically recommended. This dataset needs approximately 50GB of space and includes various human scans necessary for training. Users are encouraged to expand their datasets with additional scans from sources such as Twindom or Render People, ensuring they match the camera configurations used in GPS-Gaussian training scenarios.

Training Process

The training consists of two main stages:

Stage 1: Pretraining the depth prediction model by setting the appropriate data path in the configuration file and running the training script.
```
python train_stage1.py
```
Stage 2: Training the full model using the preprocessed datasets and the pre-trained depth model from Stage 1.
```
python train_stage2.py
```

Pre-trained models are also available for users who prefer to skip these steps and proceed directly to testing.

Testing

For real-world application, test data can be downloaded and processed to synthesize new views between two source views or even perform freeview rendering for a more comprehensive visualization. Scripts are provided to facilitate these processes, allowing adjustments to the novel viewpoint positions and the number of novel views generated.

Usage in Research

GPS-Gaussian represents a significant leap in the field of computer vision, particularly in generating new perspectives of human figures with speed and accuracy. Researchers utilizing GPS-Gaussian can cite the project in their work, as it offers valuable resources and advancements in real-time novel view synthesis.

GPS-Gaussian is a testament to the power of collaboration and technological advancement, providing a cutting-edge tool for researchers and developers in computer vision and graphics.