#Diffusion Models
stable-diffusion-webui-colab
Discover varied WebUI options available on Google Colab, including DreamBooth and LoRA trainer. The repository supports ‘lite’, ‘stable’, and ‘nightly’ builds, each offering distinct features and updates. Access step-by-step installation guides and direct links to various diffusion models like cyberpunk anime and inpainting, ensuring efficient WebUI operation with frequent updates.
Text-To-Video-Finetuning
Learn about the progress in video diffusion model finetuning through the use of LoRA. This resource offers valuable configuration examples for training, data preprocessing, and automatic captioning, making it suitable for researchers and developers. Achieve superior results with ModelScope integration and Torch 2.0, ensuring efficient memory usage. Discover community-supported models like Zeroscope and Potat1 to improve video generation precision and effectiveness.
UniPC
UniPC provides a training-free framework for rapid sampling of diffusion models. With unified predictor-corrector components, it supports multiple orders and model types, enhancing sampling speed and quality, especially in stable-diffusion and other latent-space models. Integrated with Diffusers for easy implementation, UniPC facilitates efficient sampling and improved convergence in fewer steps, suitable for both noise and data prediction tasks.
MuseV
Explore MuseV, a framework for generating high-fidelity virtual human videos with limitless duration using Visual Conditioned Parallel Denoising. Compatible with Stable Diffusion, it supports various applications from image-to-video to video-to-video. Also, check out MuseTalk for lip synchronization and MusePose for creating videos driven by pose signals. Participate in this community-driven effort to advance virtual human technology.
dreamoving-project
DreaMoving is a video generation framework utilizing diffusion models to produce customizable human videos. It allows for the creation of high-quality, controllable videos that realistically capture human movements and expressions. This framework marks a significant step forward in video synthesis and provides demos on ModelScope and HuggingFace platforms. Designed for researchers and enthusiasts, DreaMoving explores AI's potential in generating realistic human imagery. Its advanced algorithm aims to innovate video content creation across various applications.
SiT
Scalable Interpolant Transformers (SiT) introduce advancements in flow and diffusion-based generative modeling. Built on Diffusion Transformers (DiT), SiT connects distributions with flexible design choices. This repository includes PyTorch models, pre-trained weights, and a sampling script, designed to perform well on the ImageNet 256x256 benchmark. It is suitable for professionals exploring generative model technologies.
IDM-VTON
Explore the IDM-VTON project, which refines diffusion models to enhance virtual try-on experiences using datasets like VITON-HD and DressCode. The project offers resources such as demo models and training/inference codes, utilizing pre-trained components including the IP-Adapter for authentic fashion garment simulation. Suitable for developers in AI and fashion technology, this project provides complete guidelines for setting up and executing virtual try-on solutions.
Mix-of-Show
Explore Mix-of-Show's innovative methods for customizing diffusion models with decentralized techniques. The project simplifies code and enhances performance for applications in both single and multi-concept fusion, surpassing traditional LoRA. Featuring updates like StableDiffusion XL compatibility, it caters to various modeling needs without bias.
distill-sd
Examine the innovative knowledge-distilled variants of Stable Diffusion which provide improved speed and reduced size while preserving image integrity. Learn about the architecture that optimizes the process with advanced distillation methods, lowering VRAM use and increasing efficiency. Designed for those enhancing specific techniques via fine-tuning or LoRA training, these models, still in progress, represent an advancement in efficient image creation.
Awesome-Diffusion-Models
Explore a diverse array of resources and scholarly papers on Diffusion Models covering domains such as vision, audio, and natural language. This repository provides comprehensive access to introductory materials, tutorials, and advanced research, aiding in understanding the theory, applications, and developments in Diffusion Models. It acts as a practical guide for researchers, students, and professionals interested in deepening their knowledge on Diffusion Models, featuring practical implementations, comprehensive surveys, and instructional content.
T-GATE
Explore a training-free method to improve text-to-image diffusion models using T-GATE. It utilizes temporally gated attention to streamline the inference process into semantics-planning and fidelity-improving phases, enhancing model speed by 10-50% without compromising image quality. T-GATE integrates easily with existing frameworks and is compatible with CNN-based U-Net, Transformer, and Consistency Models. It now supports PixArt-Sigma and StableVideoDiffusion models, providing a valuable option for efficient image generation in various applications.
SoraReview
This review offers an in-depth exploration of Sora, a text-to-video generative AI model by OpenAI. It covers the model's underlying technologies and its diverse applications in industries including film, education, and healthcare. It also addresses challenges such as video safety and unbiased content generation, and examines the potential future developments in AI video production. Learn how these innovations might enhance human-AI collaboration and boost productivity and creativity in video generation.
edm2
The official PyTorch code for the CVPR 2024 paper presents improvements in training dynamics of diffusion models for image synthesis. By addressing inefficiencies in the ADM diffusion model, the paper suggests network redesigns to maintain activation and weight balance without changing the overall structure. These optimizations improve FID scores from 2.41 to 1.81 on ImageNet-512, using deterministic sampling. A new method for post-training EMA parameter tuning is also introduced, enabling precise adjustments without extra training runs.
Generative-AI
This survey provides an in-depth review of the latest methodologies and categorizations in multimodal image synthesis and editing, exploring innovative advancements in visual AI-generated content. It covers neural rendering, diffusion, and GAN-based techniques, emphasizing influential studies and projects. This resource is suitable for researchers and professionals interested in understanding the impact of generative AI on image editing, serving as a guide to the latest technologies and applications.
Diffusion-Models-Papers-Survey-Taxonomy
This survey systematically categorizes key diffusion model research papers, as detailed in the paper accepted by ACM Computing Surveys. The repository highlights innovations in sampling acceleration, noise schedule optimization, and handling data with intricate structures. It covers extensive applications in fields like computer vision and natural language processing, focusing on advancements such as text-to-image generation and anomaly detection. Additionally, the survey explores the integration of diffusion models with other generative models and their impact on multi-modal learning.
ICCV2023-Papers-with-Code
Discover 2160 innovative papers and open-source projects presented at ICCV 2023. This collection covers a wide range of computer vision topics, including vision-language models, 3D object detection, and neural radiance fields. Perfect for researchers and enthusiasts aiming to keep abreast of the latest advancements without promotional exaggeration. Dive into past CV conference papers and engage with academic groups for in-depth discussions. Access domain-specific resources and uncover progressions in areas like self-supervised learning, image editing, and diffusion models.
BrushNet
BrushNet is a diffusion-based image inpainting model that integrates easily with pre-trained systems. Its dual-branch diffusion approach effectively addresses image completion tasks and enhances control by separating masked image features and noisy latent spaces. The model offers adaptable training and inference capabilities, with detailed deployment instructions. Recognized for its innovative design, BrushNet won top prizes at the CVPR2024 GenAI Media Generation Challenge.
LLM-groundedDiffusion
Explore how LLMs enhance text-to-image diffusion by refining prompt understanding and improving image generation. This project effectively integrates into diffusers v0.24.0 and features a self-hosted model comparable to GPT-3.5, offering modularity and potential for AI research advancements.
RAVE
Using pre-trained diffusion models, this tool provides quick zero-shot video editing based on text prompts, delivering high-quality results without extra training. Its noise shuffling technique keeps videos temporally consistent and efficiently manages any video length. This solution supports versatile edits, from local changes to shape transformations, across varied scenes with dynamic and complex activities.
visual_anagrams
The project delves into the creation of optical illusions using diffusion models, emphasizing visual anagrams that alter appearance through transformations such as rotation and color inversion. Utilizing DeepFloyd's pixel diffusion model, it avoids common artifacts of latent diffusion. The repository includes comprehensive guides and Colab demos for engaging users, offering both free and advanced versions to experiment with high-resolution images. Users can generate diverse illusions in various styles and subjects, employing techniques like flipping and jigsaw. This approach underscores the relationship between art and machine learning, serving as a valuable resource for developers and researchers in synthetic visual art and perception.
MultiBooth
Discover a pioneering two-phase process for generating multi-concept images from text with improved concept fidelity and reduced inference costs. Leveraging a multi-modal encoder and succinct cross-attention mapping, this method excels in efficiency and performance, surpassing benchmarks. Learn about the Stable Diffusion v1.5-based technique for premium image synthesis, with clear explanations of core technical terms for a wider audience.
diffusion-models-class
Explore the fundamentals of diffusion models with a free course that includes both theoretical lessons and practical applications. Utilize the Diffusers library to create images and audio, develop and adjust models, and design personalized pipelines. Connect with a global community via Discord and join practical projects. The course covers topics such as stable diffusion and conditional generation, offering essential skills for learners familiar with Python and deep learning. Additional learning resources and translations enhance the accessibility of the course.
Awesome-Text-to-3D
This project offers a detailed curation of Text-to-3D and Diffusion-to-3D methods inspired by awesome-NeRF, featuring numerous research papers on converting text to 3D models using techniques like Neural Radiance Fields (NeRF) and advanced diffusion models. It is regularly updated with new resources including project page links, video tutorials, and cited literature with code repositories, providing a valuable resource for exploring innovative 3D synthesis developments. Applications range from object generation to novel view synthesis, serving as an informative guide for researchers and enthusiasts.
Paint3D
Paint3D presents a framework for generating high-resolution UV texture maps for 3D meshes without embedded lighting, allowing for easy re-lighting and editing. It uses depth-aware 2D diffusion models for initial textures and UV Inpainting alongside UVHD models for refinement, ensuring semantic consistency. The tool supports text and image-based inputs, modernizing 3D object texturing.
Paint-by-Example
Utilize Paint by Example for precise image editing through exemplar-guided framework using diffusion models. This self-supervised platform integrates source images and exemplars to prevent artifacts, ensuring high fidelity. With techniques such as arbitrary shape masks and classifier-free guidance, it offers enhanced control for editing. The latest updates provide improved detail preservation in non-masked areas and new tools for quantitative analysis, positioning the solution as a leading choice in exemplar-based image editing.
Awesome-Diffusion-Models-in-Medical-Imaging
Explore a curated collection of scholarly articles on diffusion models in medical imaging, featuring survey papers, challenges, and applications including anomaly detection and image restoration. This project compiles influential publications from conferences and journals like Medical Image Analysis and MICCAI 2023, serving as a valuable resource for professionals seeking the latest advancements in diffusion model applications.
Awesome-AIGC-3D
A well-organized resource featuring a comprehensive selection of seminal 3D generative papers, benchmarks, datasets, and implementations. It examines critical advancements in 3D AI content, focusing on methodologies such as text-to-3D generation and neural radiance fields. This collection underscores major contributions to the field, covering applications like scene synthesis and human avatar creation. Researchers and practitioners can engage with novel developments to enhance their understanding and application of these techniques. This platform is perfect for those looking to stay informed about emerging trends in 3D AI content creation.
DMD2
Discover how advanced techniques improve Distribution Matching Distillation (DMD) by eliminating regression loss and integrating GAN loss for faster image synthesis. This approach enhances training stability and efficiency through multi-step sampling, achieving notable FID scores of 1.28 on ImageNet-64x64 and 8.35 on COCO 2014. The improved method reduces inference costs and supports fast generation of high-quality megapixel images.
U-KAN
The U-KAN project utilizes Kolmogorov-Anold Network (KAN) layers to refine medical image segmentation and generation, ensuring high accuracy and lower computational demands. By embedding KAN into the U-Net architecture, U-KAN shows superior benchmark performance and offers a reliable noise prediction in diffusion models, thereby assisting in generative applications for medical imaging. As of June 2024, the project includes model checkpoints, training records, and pre-trained models for swift implementation. Notable features highlight its enhanced precision, efficiency, and adaptability in medical imaging processes.
Awesome_Mamba
Discover the advancements in Mamba models highlighting the application of state space models across medical imaging and AI. Topics extend to image enhancement, video and natural language processing, multi-modal comprehension, and 3D recognition. Explore detailed surveys and architecture updates focusing on effective data analysis solutions. Understand the tools driving innovation in computational imaging and AI healthcare as featured in our survey of efficient models for medical imaging.
LaVie
LaVie is a text-to-video conversion framework utilizing cascaded latent diffusion models. Part of the Vchitect system, it integrates Base T2V, Video Interpolation, and Video Super-Resolution features for customizable video output. It includes pre-trained models like LaVie base and Stable Diffusion, available on OpenXLab and Hugging Face Spaces. The framework offers diverse sampling methods and guidance scales, supporting the creative video generation process. Developers can follow step-by-step installation and inference tutorials. LaVie is open for academic research and commercial activities, fostering a collaborative video creation technology community.
audio-ai-timeline
Explore a detailed compilation of recent advancements in AI models for audio generation. This repository highlights innovative projects like Mustango and Music ControlNet, providing resources including sample releases, research papers, and code links. A valuable tool for researchers and developers keen on cutting-edge audio technology and AI integration in sound production.
DeepCache
DeepCache offers a unique, training-free method to enhance diffusion model speeds, achieving up to 4.1x faster processing with minimal loss. By reusing high-level features, this approach optimizes model architecture and supports pipelines like Stable Diffusion and DDPM. Compatible with sampling algorithms such as DDIM and PLMS, it provides easy integration for both image and video applications. Recent updates introduce features like AsyncDiff for multi-GPU parallel inference and a simplified plug-and-play setup, ensuring efficient model optimization without added complexity.
Feedback Email: [email protected]