taesd - Compact AutoEncoder for Efficient Real-time Image Decoding across Multiple Models

Introduction to TAESD: Tiny AutoEncoder for Stable Diffusion

What is TAESD?

TAESD stands for Tiny AutoEncoder for Stable Diffusion. It is a compact autoencoder designed to efficiently decode latents from Stable Diffusion into full-size images at a minimal cost in terms of performance. Essentially, TAESD can quickly transform the encoded information (latents) used by Stable Diffusion back into comprehensible images.

A comparison on a regular laptop shows that TAESD operates at nearly zero cost, making it an ideal choice for rapid image generation. Notably, TAESD is compatible with multiple models based on adaptations for specific model needs like SD1/2, SDXL, SD3, and FLUX.1.

How to Access TAESD?

TAESD is versatile and readily available through various platforms:

A1111 Stable Diffusion Web UI: It can serve as both a previewer and as an encoder/decoder.
Vladmandic's Automatic Repository: Available with community contributions.
ComfyUI: Offers options as a previewer and as a standalone VAE.
🧨 Diffusers (Hugging Face): Available in safetensors format for several variations including taesd, taesdxl, taesd3, and taef1.

Original weights for TAESD can also be found in the repository.

The Utility of TAESD

Due to its speed, TAESD is useful for real-time monitoring of Stable Diffusion's image generation. This can be particularly advantageous for those looking to interactively generate images or apply specific image-space loss functions efficiently.

One should note that TAESD manages image values differently as compared to the official VAE, which might require some adjustments when handling image generation processes.

How Does TAESD Work?

TAESD acts as a streamlined version of Stable Diffusion’s original VAE, composed of an encoder to compress full-size images into small latents, and a decoder to construct full-size images from these latents. The process involves a compression factor of 48x, allowing fast and convenient operations.

Internally, TAESD utilizes Conv+ReLU resblocks and 2x upsample layers as its foundational architecture, which contributes to its compact nature and high-speed performance.

Limitations to Consider

While TAESD is incredibly fast, this comes at a cost to image detail and quality. If high-image fidelity is your main concern and time is not, using the original Stable Diffusion VAE* decoder or alternatives like OpenAI’s Consistency Decoder may be preferable.

TAESD design chooses speed and ease of use over intricate detail, making it a suitable choice for projects where these attributes are prioritized.

TAESD with Video Generators

TAESD can work with video generators that produce sequences of Stable Diffusion latents, providing a basic level of detail continuity across frames. However, this may result in some flickering. For smooth video results, alternative decoders such as TAESDV or the SVD VAE may be suitable choices.

Quick Comparison

Feature	SD VAE*	TAESD
Encoder Parameters	34,163,592	1,222,532
Decoder Parameters	49,490,179	1,222,531
Memory Usage	Non-linear	Linear with latent size
High-Quality Details	Yes	No
Size and Speed	Large, slower	Tiny, much faster

*The VAE in question is a reference to the AutoencoderKL utilized within certain frameworks.

TAESD stands out as a practical solution for those needing expedited image generation from Stable Diffusion’s outputs, especially when working under constraints that necessitate quick previews or when detailed precision is less critical.