Project Introduction: Stable Fast
What is Stable Fast?
Stable Fast
is an ultra-lightweight inference optimization framework designed specifically for HuggingFace Diffusers, which are used on NVIDIA GPUs. It targets achieving state-of-the-art inference performance on every type of diffuser model, including the latest developments like the StableVideoDiffusionPipeline
. Unlike other complex tools like TensorRT, which often take a significant time to compile models, Stable Fast
can achieve this in just a few seconds.
Key capabilities of Stable Fast
include support for dynamic shapes, Low-Rank Adaptation (LoRA), and ControlNet, enhancing its flexibility and usability. These features help to push the performance limits of diffusion models without sacrificing ease or speed.
Differences With Other Acceleration Libraries
- Speed:
Stable Fast
outperforms other tools liketorch.compile
, TensorRT, and AITemplate, especially during initial compilation. - Minimalism: It works like a plug-in by utilizing existing PyTorch functionality, making it compatible with various acceleration techniques and deployment solutions.
- Compatibility: It seamlessly integrates with all versions of HuggingFace Diffusers and PyTorch. Unique to
Stable Fast
, it supports ControlNet, LoRA, and is ready to optimize the latest StableVideoDiffusionPipeline.
Installation
Stable Fast
is primarily tested on Linux and Windows Subsystem for Linux (WSL2). It's crucial to have PyTorch with CUDA support installed; versions from 1.12 to 2.1 are recommended due to known compatibility. Users can either install prebuilt wheels from the release page or build from source by fulfilling dependencies like CUDNN/CUBLAS
, Triton
, and others to ensure functional compatibility.
Usage
Stable Fast
offers diverse applications:
- Optimize StableDiffusionPipeline: Direct optimization is possible, enhancing efficiency and performance for models like StableDiffusionPipelineXL.
- Enhance LCM Pipeline: It's equipped to handle the latest latent consistency model pipeline, delivering notable speed improvements.
- Improve StableVideoDiffusionPipeline: Experience over twofold speed increase for pipeline processing.
- Dynamic LoRA Switching: Although requiring careful execution, model parameters can be updated in real-time without precluding optimization benefits.
- Model Quantization: By utilizing extended PyTorch quantization functionalities, users can achieve VRAM reductions, which are critical for resource-heavy operations like running diffusers models.
Performance Comparison
In benchmarks, Stable Fast
demonstrates its supremacy by efficiently managing models like SD 1.5 with impressive timing results compared to other tools, balancing both speed and versatility through unique processing methodologies such as CUDNN Convolution Fusion, low precision fused GEMM, and others.
Conclusion
Stable Fast
aims to remain a leading player in the field of inference optimization for diffusers, poised to expand its capabilities to further enhance large language models (LLMs) and provide efficiency improvements across various aspects of model deployment and performance. Its future releases are anticipated to be even more stable and user-friendly, reinforcing its value as a go-to solution for developers in the domain.