Introduction to HumanVid Project
HumanVid is a pioneering project in the field of human image animation, especially focusing on videos where both the camera and the subject can move, closely mimicking real movie scenes. This innovative project was developed by a team of researchers from the Chinese University of Hong Kong (CUHK) and the Shanghai AI Lab. The project has been recognized and accepted for the NeurIPS D&B Track in 2024.
What is HumanVid?
HumanVid introduces a breakthrough dataset that empowers video diffusion models to generate videos capturing both camera and subject movements. Essentially, it allows for the creation of animations where the camera can move around, producing dynamic and life-like videos, as opposed to static, stationary ones. This project provides a foundation for training models that could revolutionize how animated scenes are created and interacted with, offering new possibilities in video production and virtual storytelling.
Key Offerings
-
Dataset Collection: HumanVid includes a comprehensive dataset collected from both internet sources and synthetic video production using Unreal Engine. The dataset contains versatile camera movements and human poses, crucial for training models in dynamic human animation.
-
Camera Parameters & Trajectory: A detailed approach toward camera trajectory recording is adopted from established methods. This includes the translation and rotation of the camera in three-dimensional space, as well as camera intrinsics like focal lengths, ensuring high precision in recreating realistic camera movements.
-
Human Pose Extraction: The project provides scripts and methodologies for extracting and visualizing human poses from videos, thereby lending rich data to enhance model training. This involves converting 3D human models into recognizable formats that can be used for animation purposes.
Data Sources and Structure
The data used in HumanVid comes from two primary sources:
-
Pexels Videos: Videos sourced from the Internet where camera parameters are made available for each video, though the videos themselves are accessed via URLs.
-
Unreal Engine Rendered Videos: These are synthetic videos categorized by the backgrounds they use – either 3D scenes or HDRI images. The data is meticulously organized to facilitate easy training and access.
Innovations and Implications
HumanVid stands out by showing that models trained on videos with camera movement can create convincing static backgrounds if the camera data is accurate. This insight can reduce the complexities involved in traditional static-camera video collections. Moreover, the project's results suggest potential applications in various fields including virtual reality, game development, and more immersive movie-making processes.
Accessibility and Future Directions
While the synthetic data has already been released, the HumanVid team is preparing to make the inference code, training code, and model checkpoints available, aiming to enhance accessibility and encourage innovation in human animation.
This project continues to evolve, pushing the boundaries of how we perceive animation in controlled environments with moving cameras. For those interested in this groundbreaking work, the HumanVid team welcomes feedback and engagement from the broader research community.