ReVersion: Diffusion-Based Relation Inversion from Images
Overview
ReVersion is an innovative project that introduces a task known as Relation Inversion. This task involves taking several example images that all share a common relation, such as objects being "painted on" or one object "inside" another. The goal is to identify a relation prompt — denoted as <R> — that captures this interaction. Once captured, this relation can then be applied to new objects to create entirely new scenes, expanding the creative possibilities in image synthesis.
What ReVersion Does
The ReVersion framework generates new images by adapting learnt relations to new entities. For example, if the model learns the relation "painted on" from a set of images, it can then apply this relation to new entities like a "cat painted on a stone" to create novel visual scenes.
Latest Updates
Some key updates to the project include:
- March 2024: Code optimizations now allow users to save and load just the relation prompt, avoiding the need to handle the entire model.
- August 2023: The training code for performing Relation Inversion is available, enabling users to train their own models.
- April 2023: A dedicated benchmark for ReVersion has been released, along with integration into Hugging Face's platform using Gradio for online demonstrations.
- March 2023: The project transitioned to the public domain with the release of its pre-trained models and inference code.
Installation and Usage
Installation: The project can be set up by cloning their GitHub repository and creating a Conda environment to manage dependencies easily. This setup includes the necessary packages for running PyTorch and other dependencies critical for model inference and training.
Relation Inversion: Users need exemplar images showcasing the target relation and corresponding textual descriptions. By inputting these into ReVersion, a relation prompt <R> can be generated. This prompt acts as a tool for rendering new images with the learned relationship.
Image Generation: The crafted relation prompt <R> can be used in conjunction with new object descriptions to synthesize creative and varied images, allowing users to explore different visual contexts and styles.
Features
- Diverse Image Generation: Offers the flexibility to create images with varied backgrounds and styles using relation prompts like "cat <R> stone" combined with environmental contexts such as "in the desert" or "on the beach".
- Gradio Demo: An accessible web-based user interface demo built with Gradio provides an intuitive platform for experimenting with relation-based image generation capabilities.
- ReVersion Benchmark: Comes complete with a set of diverse relations and entities, text descriptions, and scenario templates designed to facilitate comprehensive testing and development.
Conclusion
ReVersion represents a remarkable stride in image generation technology through its unique focus on capturing and applying relational contexts from images. By providing the means to explore creative illustration and synthesis, it opens new doors for artists, designers, and developers alike.
Citation
If you plan to use ReVersion in your academic work, please cite the paper as follows:
@article{huang2023reversion,
title={{ReVersion}: Diffusion-Based Relation Inversion from Images},
author={Huang, Ziqi and Wu, Tianxing and Jiang, Yuming and Chan, Kelvin C.K. and Liu, Ziwei},
journal={arXiv preprint arXiv:2303.13495},
year={2023}
}
Acknowledgement
The project is directed by Ziqi Huang and Tianxing Wu, and draws upon the foundational work in image processing from repositories like Stable Diffusion and Diffusers.