en

#Diffusion Model

CVPR2024-Papers-with-Code-Demo

The platform features a regularly updated selection of CVPR 2024 research papers and open-source code, serving as an important resource for computer vision professionals. Covering topics from image classification and object detection to advanced technologies like diffusion models and NeRF, it supports staying informed about the latest innovations. Community engagement through issue submissions and discussions is encouraged to promote collective progress in the field.

awesome-diffusion-model-in-rl

This repository compiles research papers that integrate diffusion models with reinforcement learning (RL), regularly updated to reflect the latest in diffusion RL. It highlights advantages such as eliminating bootstrapping dependencies, and mitigating short-sighted behaviors from reward discounting, while harnessing diffusion models' versatility across fields. Suitable for researchers and practitioners focused on diffusion-based policy optimization and planning in offline RL, it also includes papers from leading conferences like ICML, ICLR, and NeurIPS, offering insights into various experimental setups and best practices in RL advancement.

OmniTokenizer is a model that efficiently tokenizes images and videos, delivering top-notch reconstruction across diverse datasets. It supports high-resolution and extended videos, integrates with language and diffusion models, and excels in visual generation. Available in VQVAE and VAE versions, it comes pretrained on extensive datasets for seamless integration. The project includes detailed setup, training, and evaluation guides, making it a valuable resource for researchers and developers in visual generation.

MOFA-Video facilitates image animation by leveraging sparse-to-dense motion generation and flow-based adaptation within a Video Diffusion Model. It allows animation of static images using diverse control signals, including trajectories and keypoint sequences. Developed by Tencent AI Lab and the University of Tokyo, this technology is presented at ECCV 2024. The project provides both training and inference code, focusing on accessibility through comprehensive guides and demonstrations. Experience the conversion of static imagery to dynamic motion with MOFA-Video.

This project investigates the generalizable synthesis of small abdominal tumors using AI-driven diffusion models, with initial training on pancreatic tumors to extend to the liver and kidney. Validated through radiologist evaluations and AI testing, the project offers a platform for developing tumor synthesis models, using data from AbdomenAtlas and MSD-Liver. Features include training options for Autoencoder and Diffusion Models, and refined Segmentation Models such as U-Net, nnU-Net, and Swin UNETR. Pre-trained model checkpoints and comprehensive installation guidance are provided, facilitating ease of use in research and enhancing abdominal tumor detection.

This project uses a post-training self-supervised diffusion approach to enhance CLIP models. By integrating text-to-image generative feedback, it enhances visual precision across benchmarks, boosting performance by 3-7% on the MMVP-VLM. It retains CLIP's zero-shot ability across 29 classification benchmarks, while acting as a new Visual Assistant for improved multimodal insights.

DDSP-SVC provides a novel method for singing voice conversion utilizing advanced models such as rectified-flow and cascade diffusion, designed to function effectively on typical computer hardware. It significantly reduces training time, similar to RVC, and allows real-time voice conversion with lower resource usage compared to other projects, while ensuring high synthesis quality with vocoder enhancements. The platform supports multi-speaker models with detailed configuration, emphasizing legal data use for training, and includes a comprehensive guide and visualization tools to improve user interaction.

Learn about a method to enhance diffusion model sample quality without extra training, parameters, memory, or time. Suitable for developers focused on efficiency, this approach shows potential in image and video processing through UNetModel updates. The project provides demos on Hugging Face and open-source repositories for practical use. Models like SD1.4, SD1.5, and SDXL benefit from this enhancement, with community insights offering comprehensive parameter tuning guidance.

Kolors improves text-to-image synthesis using advanced diffusion models, ensuring high visual quality and semantic accuracy in both English and Chinese. Leveraging billions of text-image pairs, it is proficient in detailed and complex designs. Recent updates enable features like virtual try-ons, pose control, and face identification, accessible via Hugging Face and GitHub. Its performance is validated by comprehensive evaluations. The Kolors suite includes user-friendly pipelines for diffusion models, inpainting, and LoRA training, offering a robust solution for photorealistic image generation.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]