HCP-Diffusion - Comprehensive Toolbox for Efficient and Configurable Text-to-Image Generation

Project Overview: HCP-Diffusion

HCP-Diffusion serves as a toolbox designed for the Stable Diffusion models, utilizing the core capabilities of 🤗 Diffusers. This tool offers a flexible framework for configuring and supporting different components in the model training process. It provides more adaptability compared to other similar tools like webui and sd-scripts.

What makes HCP-Diffusion particularly noteworthy is its support for Colossal-AI, which is known for substantially reducing GPU memory consumption.

A key feature of HCP-Diffusion is its ability to consolidate various training techniques for text-to-image generation and model structures into a single .yaml configuration file. The toolbox has introduced an enhanced version of DreamArtist with LoRA, called DreamArtist++, which is optimized for generating high-quality images from text inputs with improved efficiency and control.

Notable Features

Layer-wise LoRA Handling: Supports LoRA operations for Conv2d layers.
Finer Control Over Layers: Allows precise fine-tuning and model ensemble at the layer level.
Prompt-tuning Flexibility: Facilitates prompt refinement using multiple words.
Enhanced DreamArtist Tools: Includes both DreamArtist and its advanced version, DreamArtist++.
Innovative ARB Technology: Aspect Ratio Bucket with automated clustering.
Multiple Dataset Support: Capable of integrating multiple data sources.
Attention Mechanisms: Implements image attention masks and word attention multipliers.
Optimized Word Usage: Accommodates custom words that may span multiple spaces.
Extended Input Policies: Provides mechanisms for expanding maximum sentence lengths.
Integration with Modern Technologies: Uses 🤗 Accelerate, Colossal-AI, and xFormers.
Advanced Training Tactics: Includes features like tag shuffle, dropout, and Safetensors support.
Controlnet Adaptation: Supports training with Controlnet.
Custom Optimization Strategies: Offers a variety of custom optimizers and learning rate schedulers.
Support for SDXL models.

Installation Guide

Via pip

You can install HCP-Diffusion directly using pip:

pip install hcpdiff
# Initialize a new project
hcpinit

From Source

For those preferring from-source installation:

git clone https://github.com/7eu7d7/HCP-Diffusion.git
cd HCP-Diffusion
pip install -e .
# Initialize or modify project
## hcpinit

To optimize for VRAM and speed, install xFormers:

# with conda
conda install xformers -c xformers

# or pip
pip install xformers>=0.0.17

User Instructions

Training

Training can be carried out using scripts compatible with 🤗 Accelerate or Colossal-AI. Ensure environment configuration for 🤗 Accelerate is done prior.

For a comprehensive setup via 🤗 Accelerate:

accelerate launch -m hcpdiff.train_ac --cfg cfgs/train/cfg_file.yaml

For single-GPU work:

accelerate launch -m hcpdiff.train_ac_single --cfg cfgs/train/cfg_file.yaml

Using Colossal-AI with torchrun:

torchrun --nproc_per_node 1 -m hcpdiff.train_colo --cfg cfgs/train/cfg_file.yaml

Inference

Inference involves generating images from text prompts:

python -m hcpdiff.visualizer --cfg cfgs/infer/cfg.yaml pretrained_model=pretrained_model_path \
        prompt='positive_prompt' \
        neg_prompt='negative_prompt' \
        seed=42

Converting Stable Diffusion Models

To use this toolbox, conversion of Stable Diffusion models to the supported format is necessary.

Conversion can be performed using scripts from 🤗 Diffusers.

Example for transforming models:

python -m hcpdiff.tools.sd2diffusers \
    --checkpoint_path "path_to_stable_diffusion_model" \
    --original_config_file "path_to_config_file" \
    --dump_path "save_directory" \
    [--extract_ema]
    [--from_safetensors]
    [--to_safetensors]

Tutorials and Guidance

Detailed guides are provided to facilitate learning:

Contribution Invitation

The team encourages contributions to expand its features and models.

Maintainers

The project is under the stewardship of HCP-Lab at SYSU, noted for their proficient handling of such sophisticated tools.

Researchers and developers may refer to the citation provided for academic or professional use.