Real3D: Scaling Up Large Reconstruction Models with Real-World Images
Real3D is an innovative Large Reconstruction Model (LRM) project designed to enhance the training and performance of 3D reconstruction models using real-world images. Historically, such models have relied heavily on synthetic 3D assets or multi-view captures, which, although useful, are limited in scale and breadth. Real3D stands out by being the first system capable of training on single-view real-world images.
Key Features
Novel Self-training Framework
Real3D introduces a self-training approach that leverages both existing 3D/multi-view synthetic datasets and a diverse range of single-view real images. This method benefits from a novel design that includes unsupervised losses that enable pixel- and semantic-level supervision. This is particularly useful even for those training examples that lack ground-truth 3D data or new viewpoints.
Automatic Data Curation
To further enhance its capabilities, Real3D incorporates an advanced data curation strategy. This approach automatically selects high-quality examples from a wide array of in-the-wild images, ensuring that the model is exposed to a varied and realistic data distribution.
Superior Performance
Experiments indicate that Real3D consistently surpasses previous models across diverse evaluations, utilizing both real and synthetic data. It excels in handling both in-domain and out-of-domain shapes, making it adaptable to various application scenarios.
Installation and Usage
To experience Real3D's capabilities, users need to set up the environment using Python and PyTorch, followed by downloading the necessary model weights. This setup allows users to run demos, modify image paths and configurations, and adapt the chunk sizes to their GPU capabilities.
Training Procedure
The training process involves several steps. Initially, data preparation includes the utilization of datasets such as MVImgNet, CO3D, OmniObject3D, along with the curated real images. An optional step is to fine-tune TripoSR to predict consistent 3D scales before embarking on self-training on real images.
Evaluation
Real3D offers flexible evaluation methods. Users can evaluate on datasets with available multi-view ground truths like CO3D, or explore single-view image evaluation configurations.
Future Plans and Acknowledgements
Future developments include the release of real-world data to further empower the community. Real3D development is based on the TripoSR framework, and acknowledges contributions from the team comprising Hanwen Jiang, Qixing Huang, and Georgios Pavlakos.
Reference
For those interested in a deeper dive, the project's technical details are documented in a publication available on arXiv. The project page and demonstrations are also accessible for hands-on exploration. For academic referencing, a BibTex entry is provided.