Introduction to SD-Latent-Interposer
SD-Latent-Interposer is a compact neural network tool designed to bridge the gap between various Stable Diffusion models. It enables the latents generated by these diverse models to become interoperable without the need for decoding and re-encoding through a Variational Autoencoder (VAE). This project predominantly focuses on creating direct transitions between the SDXL model and other models such as SDv1.5.
Installation
To get started with SD-Latent-Interposer, there are a couple of installation methods available:
-
Clone the GitHub repository into your custom nodes directory via the command:
git clone https://github.com/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer
-
Alternatively, download the script directly from the repository and place it in the
ComfyUI/custom_nodes
directory. It's necessary to havehfhub
installed, which you can do with the command:pip install huggingface-hub
Model weights necessary for various functions can also be found hosted on Hugging Face.
Usage
To use the interposer, insert it in the workflow where you would usually apply a VAE decode followed by a VAE encode. Adjust the denoise values judiciously to reduce artifacts while maintaining the essence of the composition.
In absence of the interposer, the latent spaces from different models remain incompatible, which is remedied by this tool.
Local Models
By default, SD-Latent-Interposer fetches necessary files from the Hugging Face hub. However, if you prefer an offline experience or have a sketchy internet connection, you can opt to create a local models
directory and place the model files there.
To use local resources, clone the repository models into your system with:
git clone https://huggingface.co/city96/SD-Latent-Interposer custom_nodes/SD-Latent-Interposer/models
Supported Models and Compatibility
SD-Latent-Interposer is compatible with various models:
- Model names covered include SD v1.x, SDXL, Stable Diffusion 3, Flux.1, and Stable Cascade.
- Detailed mapping between these models highlights how latent transitions can be performed among them. For instance, transitions like from
xl
tov1
and others are supported using version 4.0 of the interposer.
Training Insights
Training the SD-Latent-Interposer involves setting up training parameters from a provided configuration file. The dataset involves .bin
files that portray latent versions in a [batch, channels, height, width] format.
Interposer v4.0
The training process for this version uses two model copies to establish loss metrics like p_loss
, b_loss
, r_loss
, and h_loss
, focusing on refining latent transformation efficiency. Models were trained extensively on hardware like NVIDIA RTX 3080 and Tesla V100S with varied batch sizes.
Older Versions
The project also outlines insights from older interposer versions like v3.1, v1.1, and v1.0. Each previous version reflects developments such as improvements in architecture, training datasets, and strategies to overcome existing challenges, particularly in latent space switching.
The necessity for SD-Latent-Interposer stems from a practical gap in transitioning latents across different models without degrading quality. With continuous updates and training enhancements, it stands as a formidable tool for those working extensively with Stable Diffusion models.