ComfyUI-AnimateAnyone-Evolved

ComfyUI-AnimateAnyone-Evolved is a groundbreaking project that enhances the AnimateAnyone implementation, aiming to transform pose image sequences and reference images into stylish videos. The project currently targets generating desired pose-to-video outputs at a frame rate of 1+ frames per second (FPS) on high-performance GPUs such as the RTX 3080 and beyond.

Features and Capabilities

Supported Samplers and Schedulers

The project supports a range of samplers and schedulers to create high-quality animations:

DDIM (Denoising Diffusion Implicit Models):
- Capable of producing 24-frame pose image sequences with settings of steps=20 and context_frames=24 or 12. The generation time varies from approximately 425 to 836 seconds on an RTX 3080 GPU.
DPM++ 2M Karras:
- Offers performance with 24-frame sequences, taking around 407 seconds on a similar GPU setup.
LCM (Latent Continuous Model):
- Allows handling of longer pose image sequences and is efficient, handling sequences with context_frames=24 within 607 seconds.
Euler and Euler Ancestral:
- Euler methods support pose image sequences execution taking about 451 seconds.
Learning Module Support:
- Users can add Lora to extend capabilities, supporting substantial pose image sequences.
- The system demonstrates the ability to manage over 120 frames, with key parameters like context_frames affecting GPU usage rather than the sequence's length.

The current workflow aligns closely with the original AnimateAnyone pipeline, which translates images to video while maintaining efficiency and high-fidelity output. The project increasingly diversifies its features by incorporating advancements like the Modular Execution Engine.

Roadmap for Future Development

StreamDiffusion Components:
- Implementing Residual CFG (Configuration) as proposed in StreamDiffusion, potentially doubling the speed. Presently, the results didn’t match expectations using DDIM, but further examination with LCM is promising.
Integration with Open-AnimateAnyone and AnimateAnyone:
- Anticipated integration of these initiatives and their pre-trained models once released.
Model Optimization with stable-fast:
- Converting models to boost performance, which could potentially improve speeds by 2X.
LCM Lora Training:
- Developing a dedicated Lora to enhance denoise unet speeds potentially by 5X.
Quality Improvements:
- Possible training of a new model using a superior dataset to enhance output quality depending on evolving needs.

Installation Guide

For installing and using this project, it’s vital to clone the repository into the ComfyUI directory and install dependent Python packages. Users need to employ specific pre-trained models provided in the repository, and download additional components like the clip image encoder and VAE (Variational Autoencoder) from external resources for optimal functionality.

With its continuous improvements and adaptability, ComfyUI-AnimateAnyone-Evolved is paving the way for superior and more efficient video animation technology. The project remains actively developed, moving toward even faster and higher quality outputs.