Overview of RayDiffusion
RayDiffusion is an advanced project focused on novel pose estimation techniques using ray diffusion methodology. Developed for presentation at the International Conference on Learning Representations (ICLR) 2024, this project explores innovative ways to interpret cameras as rays, paving the path for enhanced pose estimation.
Setup and Installation
To get started with RayDiffusion, the project repository can be cloned from GitHub using the following command:
git clone --depth=1 --branch=main https://github.com/jasonyzhang/RayDiffusion.git
Environment Setup
For managing software dependencies, it is recommended to use a conda environment:
-
Create and Activate Environment:
Create a conda environment with Python 3.10 and activate it:conda create -n raydiffusion python=3.10 conda activate raydiffusion
-
Install Pytorch and Required Libraries:
Install Pytorch version 2.1.1 along with necessary libraries such as torchvision, torchaudio, and CUDA:conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia conda install xformers -c xformers pip install -r requirements.txt
-
Pytorch3D Installation:
Follow additional directions for installing Pytorch3D using pre-built wheels:pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt211/download.html
Running Demos
The project allows users to experiment with ray diffusion through a series of demos:
-
Download Model Weights:
Required model weights can be downloaded from Google Drive:gdown https://drive.google.com/uc\?id\=1anIKsm66zmDiFuo8Nmm1HupcitM6NY7e unzip models.zip
-
Demo with Known Bounding Boxes:
Execute the ray diffusion demo using provided bounding boxes:python demo.py --model_dir models/co3d_diffusion --image_dir examples/robot/images --bbox_path examples/robot/bboxes.json --output_path robot.html
-
Demo with Automatic Bounding Box Extraction:
Run the demo leveraging automatic extraction from masks:python demo.py --model_dir models/co3d_diffusion --image_dir examples/robot/images --mask_dir examples/robot/masks --output_path robot.html
-
Ray Regression Demo:
Perform a ray regression task:python demo.py --model_dir models/co3d_regression --image_dir examples/robot/images --bbox_path examples/robot/bboxes.json --output_path robot.html
Training and Evaluation
-
Training Process:
Train the ray diffusion model using a multi-GPU setup with the following command:accelerate launch --multi_gpu --gpu_ids 0,1,2,3,4,5,6,7 --num_processes 8 train.py training.batch_size=8 training.max_iterations=450000
Additional guidance is provided in the project's training documentation.
-
Evaluation:
For evaluating the model's performance, follow the instructions detailed in the project's evaluation documentation.
Academic Contribution
As a contribution to the field, RayDiffusion invites interested scholars and practitioners to use this pioneering research, with due citation to its authors in academic contexts:
@InProceedings{zhang2024raydiffusion,
title={Cameras as Rays: Pose Estimation via Ray Diffusion},
author={Zhang, Jason Y and Lin, Amy and Kumar, Moneish and Yang, Tzu-Hsuan and Ramanan, Deva and Tulsiani, Shubham},
booktitle={International Conference on Learning Representations (ICLR)},
year={2024}
}
RayDiffusion exemplifies a seamless blend of cutting-edge technology and practical application, inviting users to explore the boundaries of pose estimation through the lens of ray diffusion.