stable-fast
Stable-fast provides top-tier inference capabilities for diffuser models, such as the StableVideoDiffusionPipeline, with compilation in seconds, unlike TensorRT. It natively supports dynamic shapes, LoRA, and ControlNet. Primed for HuggingFace Diffusers on NVIDIA GPUs, this framework leverages techniques like CUDNN Convolution Fusion and low precision Fused GEMM for enhancements. Designed for compatibility with multiple PyTorch editions and acceleration tools, Stable-fast requires minimal adjustments for maximum performance.