DisCo - Techniques for Generating Realistic Human Dance Movements with Disentangled Control

DisCo: Disentangled Control for Realistic Human Dance Generation

Overview

DisCo is a revolutionary project that focuses on generating realistic human dance movements through advanced AI technologies. Created by a collaboration among experts from Nanyang Technological University, Microsoft Azure AI, and the University at Buffalo, DisCo stands out as a flexible and efficient toolkit for generating human dances in both image and video formats.

Key Features

Versatility and Generalizability: DisCo does not require human-specific fine-tuning, which means it can generate dance movements for a wide range of human subjects, diverging from traditional methods that focus on a specific domain like fashion modeling.
State-of-the-Art Results: The project achieves cutting-edge performance in generating referring human dance movements, successfully replicating professional dance routines.
Extensive Applications: It offers various use cases, including pre-training, fine-tuning, and human-specific fine-tuning, making it a valuable tool for both research and real-world applications.
User-Friendly Framework: DisCo offers an easy-to-follow structure, supporting efficient training techniques and applications, making it accessible for both users and researchers.

Getting Started

To start using DisCo, users need to install the necessary software dependencies, prepare datasets for pre-training and fine-tuning, and run the pre-trained models. The project provides detailed guidelines for setting up the environment, ensuring that even those with minimal technical expertise can deploy and experiment with the toolkit.

Installation Instructions

Basic Software Setup: Users need Python 3.8, and following the installation, a range of specific packages such as PyTorch and torchvision are necessary.
Acceleration Tools: Additional installations for tools like DeepSpeed and Xformers can significantly speed up processing times for large datasets.

Data Preparation

Human Attribute Pre-training: DisCo uses a massive dataset of approximately 700,000 images from various sources like COCO and DeepFashion2, filtered to include diverse human attributes.
Fine-tuning Dataset: Fine-tuning involves a focused dataset from TikTok, which helps in refining the outputs for more temporally consistent and visually appealing dance sequences.

Training and Fine-tuning

Users can train their models using pre-processed data, followed by fine-tuning the model with disentangled controls for achieving nuanced dance generation. Temporal modules are also incorporated to enhance the smoothness of dance transitions.

Usage Cases

For General Users: An online demo is available, making it straightforward to test and visualize generated dance movements.
For Researchers: DisCo serves as a powerful codebase for re-implementation, offering a plethora of research opportunities aimed at further improvements.

Conclusion

The DisCo project is at the forefront of AI-driven dance generation, offering an unmatched level of flexibility and realism in creating human dances. With its robust framework and comprehensive documentation, DisCo is an invaluable resource for advancing research and exploring new frontiers in dance generation technology.