#Generative Models

Logo of generative-models
generative-models
SV4D from Stability AI provides video-to-4D diffusion for consistent novel-view video synthesis, integrating SV3D's multi-view capabilities. Offers demos and guides for improved real-world video applications.
Logo of Lumina-T2X
Lumina-T2X
Lumina-T2X utilizes flow-based diffusion transformers to effectively convert text into various modalities, including images, videos, and music. It supports high-quality outputs with resolutions up to 2K, and accommodates multilingual prompts and emojis. Recent enhancements improve visual quality, offering new demos that highlight its versatility in vision-language tasks, targeting developers and researchers engaged in generative AI.
Logo of AIGS
AIGS
The survey paper explores the developing field of AI-generated images used as data sources, emphasizing the methodologies and various uses of synthetic visual data. It categorizes the content comprehensively, focusing on generative models and neural rendering, applied across 2D and 3D visual perception and medical data synthesis. By reviewing diverse methods such as generative adversarial networks and diffusion models, the paper examines new applications in image classification, segmentation, and self-supervised learning, providing insights into the future potential of AI-generated content across different industries.
Logo of V-Express
V-Express
Discover a sophisticated approach to portrait video generation that harmonizes weak and strong control signals like text, audio, images, and poses. This technique uses conditional dropout to refine generative models, facilitating progressive training for optimizing weaker signals such as audio, thus ensuring precise video synthesis control. It's particularly suited for applications demanding mixed signal inputs, enhancing convergence and quality in portrait generation. Learn about innovations including memory-efficient extensions for longer videos and advanced post-processing to reduce flickering. V-Express efficiently integrates diverse signal controls for exemplary video generation.
Logo of ai-notes
ai-notes
The repository compiles extensive notes on advanced AI, emphasizing generative technologies and large language models. Serving as foundational material for the LSpace newsletter, it covers topics like text generation, AI infrastructure, audio and code generation, and image synthesis. Featuring tools such as GPT-4, ChatGPT, and Stable Diffusion, the notes detail contemporary developments, aiding AI enthusiasts and professionals in keeping updated with AI innovation and application.
Logo of ydata-synthetic
ydata-synthetic
Learn about the transition from ydata-synthetic to ydata-sdk, offering enhanced privacy and performance in synthetic data generation. Explore how this open-source package simplifies the process using a single API for model optimization, leveraging GANs for dataset improvement, and ensuring privacy compliance and high-quality data outputs.
Logo of conditional-flow-matching
conditional-flow-matching
TorchCFM provides an efficient approach for training continuous normalizing flow models with Conditional Flow Matching, enhancing the speed of generative modeling and inference. This library reduces the performance gap between CNFs and diffusion models, supporting applications across various data types, such as image and tabular data generation. It includes resources for optimization in flow-based models with PyTorch and PyTorch Lightning, serving as a versatile tool for researchers and developers.
Logo of HEBO
HEBO
HEBO is an innovative Bayesian optimization library by Huawei Noah's Ark Lab, crafted for a wide range of applications. As a notable participant in the NeurIPS 2020 Black-Box Optimization Challenge, it efficiently optimizes complex functions using heteroscedastic evolutionary methods. This library offers a flexible framework that simplifies implementation and integration into existing workflows, making it a valuable resource for researchers and developers in Bayesian optimization.
Logo of mmagic
mmagic
The toolkit supports advanced generative AI for various image and video editing tasks, powered by the OpenMMLab 2.0 framework. It integrates state-of-the-art models in text-to-image diffusion and 3D generation. Suitable for AIGC research, it facilitates efficient deep learning framework development with technologies such as GAN and CNN, and operates on Python 3.9+ and PyTorch 2.0+ for seamless AI-driven creative processes.
Logo of scepter
scepter
SCEPTER is an open-source repository focused on generative training and inference, providing tools for image generation, transfer, and editing. It incorporates community approaches and Alibaba Tongyi Lab's proprietary methods, making it pivotal for AI-generated content research. Key features include a generative training framework, ease of implementing popular methods, and the SCEPTER Studio for interactive use. Recent updates add support for the FLUX framework, and introduce models like ACE for varied image editing and SCEdit for controllable synthesis, streamlining innovation in generative model development.
Logo of clean-fid
clean-fid
The project offers a uniform approach to calculating Frechet Inception Distance (FID) scores for the evaluation of generative models, ensuring consistency by addressing discrepancies in image processing across libraries. It enables the computation of FID, KID, and CLIP-FID scores, including legacy options. With readily available statistics for popular datasets such as CIFAR-10 and FFHQ, this tool facilitates efficient model evaluation, promoting transparency and reproducibility in GAN assessments.
Logo of LFM
LFM
Discover a framework using flow matching in latent spaces of autoencoders to improve efficiency and scalability in image synthesis. This method addresses computational issues in diffusion models, supporting efficient training with limited resources. Validated on datasets like CelebA-HQ and ImageNet, it provides insight through the Wasserstein-2 distance between latent and data distributions.
Logo of generate
generate
Generate is a Python package providing access to top generative models across platforms like OpenAI and Azure. Its unified API supports diverse outputs including text, image, and speech generation. With options for synchronous and asynchronous processing, and built-in tools like Chainlit UI and rate control, it suits developers aiming for efficient model integrations. Generate emphasizes minimal dependencies and high-quality code, enhancing the development environment for generative AI projects.
Logo of ML-from-scratch-seminar
ML-from-scratch-seminar
This seminar, hosted by Harvard's Neurobiology Department, provides graduate students and postdocs an opportunity to explore machine learning models through basic Python sessions. It combines theoretical discussions and practical coding over two evenings, helping participants understand algorithm dynamics and limitations. The program promotes an in-depth grasp of computations, ideal for those in neuroscience or computer science fields.
Logo of Awesome-Evaluation-of-Visual-Generation
Awesome-Evaluation-of-Visual-Generation
The repository acts as a detailed archive of methods for evaluating visual generation models and their outputs, including images and videos. It highlights crucial areas such as model performance, generated content analysis, and alignment with user inputs. It offers resources, metrics, and methodologies concerning latent representations, condition consistency, and overall quality assessments. Community contributions via issues or pull requests are welcomed to maintain its relevance. This serves as a guide for enhancing visual generation with insights and evaluation techniques.
Logo of awesome-ebm
awesome-ebm
Awesome EBM offers a curated collection of works on energy-based learning, including papers, workshops, and libraries. It serves as a crucial resource for researchers and practitioners delving into energy-based models. Key areas include data generation, density estimation, and adversarial robustness. Featuring reverse chronological research papers from noteworthy conferences such as NeurIPS and ICML, it provides a window into current advancements. Additionally, it underscores applications in image and language modeling, acting as an informative guide for implementing energy-based techniques.