Prompt-to-Prompt: An Introduction
Latent Diffusion and Stable Diffusion are powerful image generation techniques. The Prompt-to-Prompt project builds upon these techniques to offer a unique approach to image editing through text prompts. This project enables users to edit images non-invasively by manipulating the attention mechanisms inherent in diffusion models.
Setup
The project is implemented in Python 3.8 and utilizes PyTorch 1.11. Pre-trained models are accessed via huggingface/diffusers. Specifically, the project focuses on two types of diffusion models: Latent Diffusion and Stable Diffusion. The required packages needed to run the project are listed in the requirements file. The code has been tested on a Tesla V100 GPU with 16GB VRAM but should also be compatible with other GPUs possessing at least 12GB VRAM.
Quickstart
To quickly understand how the Prompt-to-Prompt project works, it is recommended to explore the provided Jupyter notebooks: prompt-to-prompt_ldm and prompt-to-prompt_stable. These notebooks demonstrate end-to-end examples utilizing the prompt-to-prompt functionality over Latent Diffusion and Stable Diffusion, respectively. They offer a practical introduction to prompt edits and the API used for such edits.
Prompt Edits
The project innovates by employing the abstract class AttentionControl
, which is central to applying different prompt edits. Each attention layer within the diffusion model uses this class to alter the attention weights during image generation:
class AttentionControl(abc.ABC):
@abc.abstractmethod
def forward (self, attn, is_cross: bool, place_in_unet: str):
raise NotImplementedError
The principal idea is to modify attention weights to achieve various styles of image editing. Different types of prompt edits can influence the attention weights differently, as outlined below:
-
Replacement: Swapping original prompt tokens with new ones. For example, changing "A painting of a squirrel eating a burger" to "A painting of a squirrel eating a lasagna". This is managed through an
AttentionReplace
class. -
Refinement: Adding new descriptive tokens to enhance the prompt, such as "A watercolor painting of a squirrel eating a burger." This is managed by the
AttentionEditRefine
class. -
Re-weight: Adjusting the influence of certain tokens within the prompt. For instance, in "A photo of a poppy field at night", changing how prominently
night
impacts the final image. This is managed by anAttentionReweight
class.
Attention Control Options
Several options are available for controlling the attention mechanism:
-
cross_replace_steps
: This option determines the fraction of steps to modify cross-attention maps, potentially on a per-word basis in the prompt. -
self_replace_steps
: Specifies the fraction of steps to alter the self-attention maps. -
local_blend
(optional): ALocalBlend
object allows for local modifications, initialized with words from the prompt that correspond to specific image areas for targeted editing. -
equalizer
: Used exclusively for attention re-weighting, this vector scales the cross-attention weightings.
Additional Tools and Options
The project also features a tool for editing real images called Null-Text Inversion, providing a mechanism for text-based adjustments of realistic images using the Stable Diffusion model. The optimization process builds on DDIM inversion by fine-tuning the null-text embedding for guided diffusion models.
Disclaimer
Please note that this is not an officially supported Google product.
By delving into these resources and experimenting with the project setup, users can harness the power of Prompt-to-Prompt to create compelling image edits based on textual cues.