#video generation

Logo of Open-Sora-Plan
Open-Sora-Plan
The Open-Sora Plan, spearheaded by Peking University and TuShare's AIGC Lab, seeks to reproduce 'Sora' with a focus on simplicity and scalability. The project harnesses Huawei Ascend AI for comprehensive training and inference, achieving video quality that meets industry standards. It encourages contributions from the open-source community to address gaps and improve the system continuously. Upcoming updates will feature multi-modal support via MindSpeed-MM and enhanced distributed training capabilities.
Logo of FakeSoraAPI
FakeSoraAPI
Discover an efficient API solution for text-to-video conversion deployed through Vercel or local environments. FakeSoraAPI integrates with the SoraWebui platform to convert text data into video quickly and efficiently. It offers developers a straightforward setup with easy installation and commands, enhancing productivity without unnecessary complexity. Unlock new creative possibilities with this text-to-video API tool.
Logo of motionagent
motionagent
MotionAgent is a deep learning application that converts user-generated scripts into videos, utilizing the open-source ModelScope model community. It features script creation with large language models like Qwen-7B-Chat, high-resolution video production from images, and personalized background music. Compatible with Python 3.8, torch 2.0.1, and CUDA 11.7 on Ubuntu 20.04, it requires 36GB GPU memory.
Logo of MotionCtrl
MotionCtrl
MotionCtrl presents an innovative solution in video generation with its unified and adaptable motion controller, independently managing both camera and object movements. Featured at SIGGRAPH 2024, it integrates with technologies like AnimateDiff, SVD, and VideoCrafter. The model is equipped for training and inference, enabling customization of object paths with HandyTrajDrawer and utilizing datasets such as RealEstate10K and WebVid. Accessible Gradio demos and comprehensive code resources position MotionCtrl as a versatile tool in video motion control. Explore its capabilities on Hugging Face and GitHub.
Logo of StoryDiffusion
StoryDiffusion
StoryDiffusion leverages consistent self-attention to achieve character-consistent image generation and employs a motion predictor for long-range video production. It integrates with diverse diffusion models and utilizes user-input text prompts to optimize layout arrangements. Focusing on the initial generation of consistent images, the tool facilitates smooth video transitions and supports a two-stage methodology for creating extensive videos. The initiative supports the advancement of AI-generated media by supplying resources for the development of high-quality visual content, while endorsing responsible use in adherence to legal standards.
Logo of CogVideo
CogVideo
The CogVideoX series presents advanced, open-source models for video generation, enabling tasks such as text-to-video, video continuation, and image-to-video. The latest models, CogVideoX-5B and CogVideoX-5B-I2V, enhance video quality and visual effects, providing a flexible framework for GPU fine-tuning. Recent enhancements feature open-source access to key models, boosting inference efficiency and incorporating new prompt optimization tools. Supported by detailed technical documents and community interaction, the series offers innovative video generation capabilities, assisting both developers and researchers.
Logo of StreamingT2V
StreamingT2V
StreamingSVD utilizes an advanced autoregressive technique to enhance text-to-video and image-to-video generation, producing long, high-quality videos with temporal consistency. The transformation of SVD into a long video generator is achieved while aligning closely with input text or images and maintaining high frame-level quality. Capable of generating videos up to 2 minutes with rich motion dynamics, StreamingSVD is part of the StreamingT2V family and showcases adaptability through improvements in base models. This project, suitable for research, demands substantial VRAM and integrates industry-standard tools. Discover the technical documentation and explore advancements in long-video generation.
Logo of VideoElevator
VideoElevator
VideoElevator utilizes text-to-image diffusion models to enhance video generation quality without training. This plug-and-play method integrates diverse text-to-video and text-to-image models, focusing on temporal consistency and detailed precision. It offers a straightforward approach to downloading weights and running scripts for improved video generation, all with less than 11 GB VRAM. Discover how temporal motion refinement and spatial enhancement elevate video quality.
Logo of Awesome-LLMs-meet-Multimodal-Generation
Awesome-LLMs-meet-Multimodal-Generation
This repository offers a curated list of LLMs designed for multimodal generation and editing, encompassing visual and audio modalities. It serves as a resource for those researching image, video, 3D, and audio content creation and modification. Contributors are invited to add insights or suggest enhancements. The focus is on both LLM-based and alternative methods, with an emphasis on datasets and practical applications. The project also includes tips for paper searches and links to code repositories, fostering innovation within the multimodal AI community.
Logo of infinite-zoom-automatic1111-webui
infinite-zoom-automatic1111-webui
The extension for Stable Diffusion WebUI enables the creation of infinite zoom effect videos using the outpainting technique. Easily install from a provided URL and customize settings to generate unique zoom effects. Detailed installation and usage guidance, along with tips, ensure optimal results and smooth transitions, providing creative freedom. Ideal for digital artists and creators, it includes a Google Colab accessible version and welcomes community contributions for ongoing development.
Logo of Open-Sora
Open-Sora
Open-Sora aims to enhance video production by providing access to sophisticated video generation techniques. True to open-source values, it offers an easy-to-use platform, encouraging innovation and inclusivity in video production. The latest updates, including 3D-VAE and rectified flow in version 1.2, improve video quality and enable functionalities like text-to-video and diverse aspect ratios. Discover its potential through the Gradio demo and utilize its comprehensive processing pipeline for efficient video content creation.
Logo of AnimateLCM
AnimateLCM
AnimateLCM utilizes a decoupled learning paradigm for efficient animation generation, optimizing image and temporal creation separately. This approach improves training speed and quality with fewer steps. The platform supports a range of video generation types, including text-to-video and image-to-video. Its models, like AnimateLCM-T2V and AnimateLCM-SVD, integrate LoRA weights and motion modules, accommodating diverse tools such as ControlNet and IP-Adapter.
Logo of VideoCrafter
VideoCrafter
An open-source video creation toolbox using advanced diffusion models to improve motion and concept integration. Supports text-to-video and image-to-video generation, offering better quality even with limited data. Engage with a creative community on Discord.
Logo of ControlNeXt
ControlNeXt
ControlNeXt introduces a highly efficient approach for generating controllable images and videos, optimizing parameter usage for enhanced speed and efficiency. The integration with LoRA allows for versatile style modifications with consistent results. Various models such as ControlNeXt-SDXL for images and ControlNeXt-SVDv2 for video showcase noticeable improvements in quality and execution. The ongoing development phase offers accessible demos and regular updates, ensuring a progressive user experience.
Logo of stable-diffusion-videos
stable-diffusion-videos
Discover a flexible tool for creating AI-generated videos using stable diffusion models, providing smooth transitions with customizable prompts. Easily installable and executable in environments like Colab, this tool allows for music-synced animations with adjustable parameters like seeds, guidance scale, and resolution. It supports both individual and batch processing, enabling personalized video outputs, and includes RealESRGAN upsampling for better image quality, making it suitable for developers and enthusiasts integrating AI video creation into their projects.
Logo of SoraWebui
SoraWebui
SoraWebui is an open-source tool that facilitates video creation via OpenAI's Sora model, enabling easy one-click deployment. It supports text-based video generation, Google login, and future Stripe payment functionalities, making it a suitable choice for developers looking for a straightforward deployment process through Vercel.
Logo of ComfyUI-AnimateAnyone-Evolved
ComfyUI-AnimateAnyone-Evolved
This project provides a refined solution for converting image sequences into stylized videos, optimized for GPUs comparable to RTX 3080. It utilizes various samplers and schedulers like DDIM, DPM++ 2M Karras, LCM, and Euler for efficient video generation up to 120+ frames. The integration with ComfyUI ensures a modular workflow. Future enhancements focus on accelerating processing speeds through pre-trained models and techniques like RCFG and stable-fast conversion.
Logo of Awesome-Video-Diffusion
Awesome-Video-Diffusion
Discover an array of diffusion models shaping video applications such as creation, editing, and restoration. This resource caters to researchers and developers keen on video technology advances. It covers video generation, controllable production, motion customization, and 3D/NeRF uses. Utilize open-source kits and models for quality enhancement, AI safety, and video restoration. Evaluate with established metrics to refine performance and analyze content. These models also present opportunities in fields like healthcare and biology.
Logo of Vlogger
Vlogger
The project presents an innovative AI platform for generating detailed vlogs from user inputs, employing a Large Language Model in an oversight role. The approach divides vlog creation into distinct phases including scripting, acting, videography, and narration, utilizing tailored models to maintain narrative integrity and visual quality. Featuring the new ShowMaker model, it enhances the spatial-temporal alignment between script and visuals. Comprehensive evaluations demonstrate the platform's capability to produce coherent, extended vlogs, pushing forward zero-shot video generation benchmarks.
Logo of Latte
Latte
The project presents an innovative approach to video generation using Latent Diffusion Transformers with PyTorch. It utilizes spatio-temporal token extraction and Transformer blocks for modeling video distribution in latent spaces, improving video quality on datasets such as FaceForensics and Taichi-HD. Including efficient model variants and extensions for text-to-video generation, the project achieves advanced performance benchmarks. The integration into diffusers also lowers GPU demands, facilitating access to efficient video creation infrastructures.