RayDiffusion - Accurate Pose Estimation Using Ray Diffusion Approach

Overview of RayDiffusion

RayDiffusion is an advanced project focused on novel pose estimation techniques using ray diffusion methodology. Developed for presentation at the International Conference on Learning Representations (ICLR) 2024, this project explores innovative ways to interpret cameras as rays, paving the path for enhanced pose estimation.

Setup and Installation

To get started with RayDiffusion, the project repository can be cloned from GitHub using the following command:

git clone --depth=1 --branch=main https://github.com/jasonyzhang/RayDiffusion.git

Environment Setup

For managing software dependencies, it is recommended to use a conda environment:

Create and Activate Environment:
Create a conda environment with Python 3.10 and activate it:
```
conda create -n raydiffusion python=3.10
conda activate raydiffusion
```

Install Pytorch and Required Libraries:
Install Pytorch version 2.1.1 along with necessary libraries such as torchvision, torchaudio, and CUDA:

conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt

Pytorch3D Installation:
Follow additional directions for installing Pytorch3D using pre-built wheels:

pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt211/download.html

Running Demos

The project allows users to experiment with ray diffusion through a series of demos:

Download Model Weights:
Required model weights can be downloaded from Google Drive:

gdown https://drive.google.com/uc\?id\=1anIKsm66zmDiFuo8Nmm1HupcitM6NY7e
unzip models.zip

Demo with Known Bounding Boxes:
Execute the ray diffusion demo using provided bounding boxes:

python demo.py --model_dir models/co3d_diffusion --image_dir examples/robot/images --bbox_path examples/robot/bboxes.json --output_path robot.html

Demo with Automatic Bounding Box Extraction:
Run the demo leveraging automatic extraction from masks:

python demo.py --model_dir models/co3d_diffusion --image_dir examples/robot/images --mask_dir examples/robot/masks --output_path robot.html

Ray Regression Demo:
Perform a ray regression task:

python demo.py --model_dir models/co3d_regression --image_dir examples/robot/images --bbox_path examples/robot/bboxes.json --output_path robot.html

Training and Evaluation

Training Process:
Train the ray diffusion model using a multi-GPU setup with the following command:
```
accelerate launch --multi_gpu --gpu_ids 0,1,2,3,4,5,6,7 --num_processes 8 train.py training.batch_size=8 training.max_iterations=450000
```
Additional guidance is provided in the project's training documentation.
Evaluation:
For evaluating the model's performance, follow the instructions detailed in the project's evaluation documentation.

Academic Contribution

As a contribution to the field, RayDiffusion invites interested scholars and practitioners to use this pioneering research, with due citation to its authors in academic contexts:

@InProceedings{zhang2024raydiffusion,
    title={Cameras as Rays: Pose Estimation via Ray Diffusion},
    author={Zhang, Jason Y and Lin, Amy and Kumar, Moneish and Yang, Tzu-Hsuan and Ramanan, Deva and Tulsiani, Shubham},
    booktitle={International Conference on Learning Representations (ICLR)},
    year={2024}
}

RayDiffusion exemplifies a seamless blend of cutting-edge technology and practical application, inviting users to explore the boundaries of pose estimation through the lens of ray diffusion.