consistencydecoder - Consistency Decoder for Improved Stable Diffusion VAE Performance

Consistency Decoder

The Consistency Decoder is a technological advancement aimed at enhancing the quality of image generation, particularly in the context of stable diffusion models. Built partly on the work of OpenAI and other researchers, this tool focuses on improving the decoding process of Variational Autoencoders (VAEs), a class of machine learning models commonly used for image generation.

Installation

The installation process for the Consistency Decoder is straightforward. Users can install it directly from the GitHub repository using the following command:

$ pip install git+https://github.com/openai/consistencydecoder.git

This ensures that users get the latest version directly from the source.

Usage

Using the Consistency Decoder involves several steps in a Python environment, primarily relying on PyTorch and the diffusers library. The workflow starts by encoding an image using the VAE of a pretrained stable diffusion model and then decoding it via either a GAN or the consistency model. Here's a breakdown of the typical usage:

Setup: Import the necessary libraries including PyTorch, StableDiffusionPipeline from the diffusers library, and the ConsistencyDecoder.
Loading Models: Initialize the Stable Diffusion model from a pretrained source. This model creates and hones images by analyzing and refining the input data.
Encoding and Decoding:
- Encoding: Load an image that you wish to process, which is then encoded by the VAE to capture its essential features in a latent space.
- Decoding: The encoded features can be decoded using a traditional GAN, which is a type of neural network that generates images from these features. Alternatively, the Consistency Decoder can be used to decode the latent features, providing potentially more consistent and accurate visual outputs.
Saving Results: The decoded images, whether by GAN or VAE, are saved to visualize and compare the quality of different decoding approaches.

import torch
from diffusers import StableDiffusionPipeline
from consistencydecoder import ConsistencyDecoder, save_image, load_image

# encode with stable diffusion vae
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, device="cuda:0"
)
pipe.vae.cuda()
decoder_consistency = ConsistencyDecoder(device="cuda:0") # Model size: 2.49 GB

image = load_image("assets/gt1.png", size=(256, 256), center_crop=True)
latent = pipe.vae.encode(image.half().cuda()).latent_dist.mean

# decode with gan
sample_gan = pipe.vae.decode(latent).sample.detach()
save_image(sample_gan, "gan.png")

# decode with vae
sample_consistency = decoder_consistency(latent)
save_image(sample_consistency, "con.png")

Examples

To better understand the difference in outcomes between the Consistency Decoder and traditional GANs, several examples are provided. These compare the original image, the GAN-decoded image, and the consistency-decoded image:

Original Image	GAN Decoder	Consistency Decoder

These images showcase how the Consistency Decoder can maintain fidelity to the original while potentially improving on certain visual aspects, offering a viable alternative to GANs in image generation tasks.