Open-Sora-Plan - Video Generation Enhancement through Open Source Collaboration on Huawei Ascend

Introduction to Open-Sora Plan

The Open-Sora Plan is a remarkable endeavor aimed at recreating Sora, an innovative project associated with OpenAI, but humorously referred to here as "ClosedAI". This initiative is the brainchild of the AIGC Joint Laboratory at Peking University, in collaboration with Tuzhang. The project's core ambition is to harness the collective power of the open-source community to improve and develop tools for AI and video generation. Contributions through pull requests are not only welcomed but are actively encouraged.

Project Objectives

The primary goal of the Open-Sora Plan is to establish a simple yet scalable repository that can replicate Sora. Despite significant progress, the project acknowledges that there is still some distance to cover to fully achieve its ambitions. However, with constant updates and iterations, the community-driven nature of this project aims to bridge that gap efficiently.

Supported Technology

The Open-Sora Plan is unique in its support for complete training and inference using the Huawei Ascend AI computing system, a domestically developed AI computing platform. Models trained on Huawei Ascend can generate video quality that rivals current industry standards, highlighting the project's potential impact in AI and video processing fields.

Community Engagement

This project is rooted in community engagement, inviting participants to contribute to its mission. Open-Sora Plan has a presence across multiple platforms, including Discord and WeChat, to facilitate communication among contributors and interested parties. A dedicated GitHub repository hosts the project's code, allowing anyone to fork, contribute, or simply watch the project's progress.

Recent Updates and Future Plans

Upcoming Features: The Open-Sora Plan will soon introduce new capabilities for large model parallelization, utilizing Huawei's MindSpeed-MM suite. This will include distributed training strategies like TP (Tensor Parallelism) and SP (Spatial Parallelism) to enhance model training capacities.
Version 1.3.0: Released on October 16, 2024, this version introduced several new features including WFVAE, prompt refiners, advanced data filtering strategies, sparse attention, and a bucket training strategy. It now efficiently supports 93x480p resolutions within 24G VRAM.
Version 1.2.0: Introduced on August 13, 2024, this version supports image-to-video generation, expanding the capabilities of Open-Sora Plan even further.
Version 1.1.0: Launched on May 27, 2024, this version enhanced video quality and length and was made entirely open source, showcasing the project's commitment to transparency and community involvement.

Visual and Technical Highlights

The Open-Sora Plan excels in video generation, leveraging high-performance techniques. Its CausalVideoVAE technology allows for high compression rates without sacrificing quality, capable of compressing videos up to 256 times. It utilizes causal convolution, facilitating simultaneous inference for both images and videos.

Conclusion

Aspiring to set new benchmarks in AI and video generation, the Open-Sora Plan is not just a technical project but a community-driven venture. As it evolves with each update, it promises to deliver increasingly sophisticated solutions, all while engaging with the global open-source community. Those interested in following or contributing to the project are encouraged to visit their GitHub repository for the latest news and updates.