cycle-diffusion - Explore CycleDiffusion Zero-Shot Image Translation with Diffusion Models

CycleDiffusion Project Introduction

Overview

CycleDiffusion is a fascinating project that dives into the world of diffusion models, specifically focusing on the latent spaces of these models. It stems from the research conducted by Chen Henry Wu and Fernando De la Torre at Carnegie Mellon University. The project examines the randomness inherent in diffusion models, comparing it to a form of magic due to its unpredictable yet powerful nature. It leverages this 'magic' to achieve novel outcomes in image edits and translations.

The project aims to formalize the concept of a "random seed" in diffusion models and find ways to infer it from real images. This simple yet innovative idea leads to significant outcomes allowing for zero-shot image-to-image translation and unpaired image-to-image translation. The underlying technology is based on Stable Diffusion, one of the widely used text-to-image diffusion models.

Key Features

Zero-Shot Image-To-Image Translation: This functionality allows translation between images without needing direct pairings of images in the training data. With the CycleDiffusion model, users can achieve creative transformations between any two images using text prompts.
Unpaired Image-To-Image Translation: This involves using diffusion models trained across different domains, thus achieving translations without requiring paired datasets.

In understanding how this model works, an image and a text triplet serve as input:

A source image is marked by a purple margin.
Source text is highlighted.
Target text where portions that overlap with source text are abbreviated.

Achievements

The paper detailing this work has been accepted to ICCV 2023, marking a significant step in academic and practical applications of CycleDiffusion. Work on this paper reveals how formalizing the 'random seed' enables profound capabilities in image editing.

Project Updates

October 2022: Initial code release in the Unified Generative Zoo repository.
November 2022: CycleDiffusion became available as a Hugging Face Diffusers pipeline, with accompanying demo.

Dependencies and Setup

To set up CycleDiffusion, one needs to:

Create a virtual environment and install necessary libraries such as torch, torchvision, and taming-transformers.
Download pre-trained models, including Stable Diffusion and Latent Diffusion Model for testing and evaluation.
Utilize wandb for experiment logging.

Data and Models

CycleDiffusion includes evaluation data necessary for zero-shot translations. For model execution, it uses pre-trained checkpoints of popular diffusion models, enabling robust and diverse image transformation outcomes.

Usage

CycleDiffusion is equipped to handle different modes of translation:

Zero-Shot using Text-to-Image Diffusion Models: Set up involves defining test samples and configuring distributed computations.
Unpaired Translations Between Domains: Users can employ diffusion models trained on various domains to achieve seamless translations.

This comprehensive package empowers both novice and expert users to explore the potential of diffusion models in creative image editing and transformations. Whether you're diving into zero-shot tasks or exploring unpaired translation possibilities, CycleDiffusion provides the tools and insights needed to leverage diffusion models' latent spaces effectively.