SD-Latent-Interposer - Facilitating Interoperability in Stable Diffusion Models

Introduction to SD-Latent-Interposer

SD-Latent-Interposer is a compact neural network tool designed to bridge the gap between various Stable Diffusion models. It enables the latents generated by these diverse models to become interoperable without the need for decoding and re-encoding through a Variational Autoencoder (VAE). This project predominantly focuses on creating direct transitions between the SDXL model and other models such as SDv1.5.

Installation

To get started with SD-Latent-Interposer, there are a couple of installation methods available:

Clone the GitHub repository into your custom nodes directory via the command:

git clone https://github.com/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer

Alternatively, download the script directly from the repository and place it in the ComfyUI/custom_nodes directory. It's necessary to have hfhub installed, which you can do with the command:
```
pip install huggingface-hub
```

Model weights necessary for various functions can also be found hosted on Hugging Face.

Usage

To use the interposer, insert it in the workflow where you would usually apply a VAE decode followed by a VAE encode. Adjust the denoise values judiciously to reduce artifacts while maintaining the essence of the composition.

In absence of the interposer, the latent spaces from different models remain incompatible, which is remedied by this tool.

Local Models

By default, SD-Latent-Interposer fetches necessary files from the Hugging Face hub. However, if you prefer an offline experience or have a sketchy internet connection, you can opt to create a local models directory and place the model files there.

To use local resources, clone the repository models into your system with:

git clone https://huggingface.co/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer/models

Supported Models and Compatibility

SD-Latent-Interposer is compatible with various models:

Model names covered include SD v1.x, SDXL, Stable Diffusion 3, Flux.1, and Stable Cascade.
Detailed mapping between these models highlights how latent transitions can be performed among them. For instance, transitions like from xl to v1 and others are supported using version 4.0 of the interposer.

Training Insights

Training the SD-Latent-Interposer involves setting up training parameters from a provided configuration file. The dataset involves .bin files that portray latent versions in a [batch, channels, height, width] format.

Interposer v4.0

The training process for this version uses two model copies to establish loss metrics like p_loss, b_loss, r_loss, and h_loss, focusing on refining latent transformation efficiency. Models were trained extensively on hardware like NVIDIA RTX 3080 and Tesla V100S with varied batch sizes.

Older Versions

The project also outlines insights from older interposer versions like v3.1, v1.1, and v1.0. Each previous version reflects developments such as improvements in architecture, training datasets, and strategies to overcome existing challenges, particularly in latent space switching.

The necessity for SD-Latent-Interposer stems from a practical gap in transitioning latents across different models without degrading quality. With continuous updates and training enhancements, it stands as a formidable tool for those working extensively with Stable Diffusion models.