#Image Generation

Logo of aidea
aidea
A fully open source Flutter-based application, AIdea integrates major language and image models like OpenAI's GPT-3.5 and GPT-4, Anthropic's Claude series, Google's Gemini Pro, and several domestic models. It offers versatile capabilities such as text-to-image conversion, super-resolution, and artistic QR code creation with models like SDXL 1.0. Explore features through downloadable client and server code links.
Logo of TerraMours_Gpt_Web
TerraMours_Gpt_Web
TerraMours-Gpt-Web is a platform enabling user authentication and multilingual chat utilizing Semantic Kernel (SK), coupled with multi-model image creation through ChatGPT and Stable Diffusion. The managerial interface provides tools for data analytics, management of chats and images, user administration, and configuration. Built on technologies such as Vue3.0, TypeScript, Naive UI, and Vite, it accommodates AI models including GPT-3.5, GPT-4, and ChatGLM. The platform is updated with new models and system enhancements, suitable for those seeking a unified AI application for communication and creative tasks.
Logo of SkyPaint-AI-Diffusion
SkyPaint-AI-Diffusion
SkyPaint leverages cutting-edge AI to transform bilingual texts into striking modern art. It utilizes optimized encoding and diffusion models for accurate language comprehension and text-to-image translation. Adapting OpenAI's CLIP model advancements, SkyPaint ensures high-quality artistic outputs, accepting mixed-language inputs and remaining compatible with various diffusion models. Ideal for both creators and developers, it emphasizes easy cross-platform usage while broadening the horizons of digital art creation.
Logo of T-GATE
T-GATE
Explore a training-free method to improve text-to-image diffusion models using T-GATE. It utilizes temporally gated attention to streamline the inference process into semantics-planning and fidelity-improving phases, enhancing model speed by 10-50% without compromising image quality. T-GATE integrates easily with existing frameworks and is compatible with CNN-based U-Net, Transformer, and Consistency Models. It now supports PixArt-Sigma and StableVideoDiffusion models, providing a valuable option for efficient image generation in various applications.
Logo of Failed-ML
Failed-ML
Examine real-world examples of machine learning failures in sectors such as recruitment and healthcare. Understand the challenges and limitations in AI development, including biases and inaccuracies in technologies like natural language processing and computer vision. Gain insights to avoid similar pitfalls in future AI projects.
Logo of RectifiedFlow
RectifiedFlow
Discover Rectified Flow: a novel approach for transport map learning and efficient data generation. It uniquely enhances Stable Diffusion with improved diversity and FID scores. Applicable in generative modeling and domain transfer, explore its capabilities with practical Colab tutorials across datasets like CIFAR10 and CelebA-HQ.
Logo of Ranni
Ranni
The project introduces a text-to-image diffusion process using a large language model that enhances semantic comprehension and a diffusion-based model for drawing. Comprising an LLM-based planning component and diffusion model, the system accurately aligns with text prompts in two phases. Listed as a CVPR 2024 oral paper, the package includes model weights such as a LoRA-finetuned LLaMa-2-7B and fully-finetuned SDv2.1. Users can explore image creation interactively through Gradio demos and apply continuous edits for targeted image changes.
Logo of consistencydecoder
consistencydecoder
Discover how Consistency Decoder enhances decoding for Stable Diffusion VAEs. This project implements advanced Consistency Models for reliable image creation, ensuring consistent image outputs through superior decoding methods. Designed to maintain high image fidelity, the decoder provides stable results across diverse applications. Easy to install and use, it seamlessly integrates with existing Stable Diffusion Pipelines and supports high-performance hardware like CUDA-enabled GPUs. Explore practical examples showcasing improved image clarity over conventional GAN decoders, representing a substantial advancement in image generation technology.
Logo of mmagic
mmagic
The toolkit supports advanced generative AI for various image and video editing tasks, powered by the OpenMMLab 2.0 framework. It integrates state-of-the-art models in text-to-image diffusion and 3D generation. Suitable for AIGC research, it facilitates efficient deep learning framework development with technologies such as GAN and CNN, and operates on Python 3.9+ and PyTorch 2.0+ for seamless AI-driven creative processes.
Logo of free-dall-e-proxy
free-dall-e-proxy
The project offers a proxy service for free access to OpenAI's DALL·E 3 image generation, supported by bots on Telegram and Discord platforms. It simplifies API integration by providing a standard OpenAI endpoint, making it easy to deploy using Docker or run directly with Python. Configuration of Coze platform agents is required to set up the service, which manages image creation requests via a RESTful API. This is a suitable choice for developers interested in implementing AI-driven image generation.
Logo of openai-dotnet
openai-dotnet
Learn how to integrate the OpenAI REST API into .NET applications using the OpenAI .NET library. Find step-by-step instructions on installing the NuGet package, using asynchronous APIs, and accessing various feature namespaces. Includes advanced functionalities such as chat completions, image generation, and audio transcription, catering to developers aiming to incorporate AI into their projects.
Logo of stylegan2-pytorch
stylegan2-pytorch
The project provides a complete PyTorch implementation of StyleGAN2, allowing training of generative adversarial networks directly via command line. It features easy setup with multi-GPU support and data-efficient training techniques for generating high-quality synthetic images, including cities and celebrity faces. Additionally, it includes options for model customization and improvements like attention mechanisms and top-k training for enhanced GAN performance. Suitable for developers interested in a straightforward yet effective tool for AI-generated imagery.
Logo of gta
gta
Learn how the Geometry-Aware Attention mechanism enhances the functionality of multi-view transformers, facilitating applications such as image generation. Presented at ICLR2024, this method offers a straightforward way to improve multi-view transformers and demonstrates its effectiveness in 2D tasks. Review our experiment results and code examples across datasets like CLEVR-TR, MSN-Hard, and ImageNet with Diffusion Transformers (DiT), showcasing GTA's capabilities for both multi-view and image Vision Transformers (ViT).
Logo of autoregressive-diffusion-pytorch
autoregressive-diffusion-pytorch
Discover the Autoregressive Diffusion Pytorch library, designed for generating images without vector quantization through autoregressive models. This implementation features advanced techniques to synthesize images as token sequences. The library provides clear installation guides and usage examples, compatible with both diffusion and flow matching methods. It serves as a flexible tool for researchers and developers focused on cutting-edge image generation technologies.
Logo of rcg
rcg
This PyTorch-based self-supervised framework excels in generating unconditional images at 256x256 resolution on ImageNet. It closes the traditional gap between unconditional and class-conditional generation, enhancing self-representation generation techniques. Latest updates feature enhanced FID evaluation via the ADM suite and new training scripts for DiT-XL with RCG. Utilizing GPUs for efficient training, the framework also offers pre-trained weights and flexible customization options with various pixel generators such as MAGE, DiT, ADM, and LDM. Visit the project's repository for detailed setup and evaluation guidance for image generation projects.
Logo of Awesome-Diffusion-Transformers
Awesome-Diffusion-Transformers
This extensive compilation delves into diffusion transformers used in various fields such as text, speech, and video production. It highlights groundbreaking research, including text-driven motion generation and scalable image synthesis models, illustrating the latest technological applications. With emphasis on methodologies like transformer-based denoising and high-resolution image synthesis, this collection provides valuable insights into efficient training techniques. Featuring works like MotionDiffuse and scalable diffusion models, it is designed for researchers and practitioners, offering a comprehensive overview of innovations in diffusion transformers, paired with accessible resources and recent research data.
Logo of VAR
VAR
Explore Visual Autoregressive Modeling (VAR), a cutting-edge method that improves upon traditional image generation models with next-scale prediction, surpassing diffusion models. This technique enhances autoregressive learning using transformers and reveals power-law scaling, along with zero-shot generalizability. Featured at NeurIPS 2024, VAR presents a significant advancement in scalable image generation. Visit our demo website to experience its interactive high-quality image creation.
Logo of Awesome-Diffusion-Models
Awesome-Diffusion-Models
Explore a diverse array of resources and scholarly papers on Diffusion Models covering domains such as vision, audio, and natural language. This repository provides comprehensive access to introductory materials, tutorials, and advanced research, aiding in understanding the theory, applications, and developments in Diffusion Models. It acts as a practical guide for researchers, students, and professionals interested in deepening their knowledge on Diffusion Models, featuring practical implementations, comprehensive surveys, and instructional content.
Logo of SwiftOpenAI
SwiftOpenAI
SwiftOpenAI provides a powerful Swift SDK for integrating with the OpenAI API, designed for easy communication with AI models like GPT-3 and GPT-4. The guide covers installation, usage, and features such as secure API key handling, image manipulation, audio conversion, and chat capabilities. Suitable for developers looking to add AI to Swift applications, the project encourages community collaboration. The demo app showcases features like model exploration, image alterations, and content moderation in a streamlined framework.
Logo of Canvas
Canvas
Canvas is an open-source platform for macOS users, facilitating image generation and editing with DALL·E 3 and DALL·E 2. Compatible with macOS 14.0 Sonoma, it offers features such as image variations and easy sharing. This native app requires an OpenAI API key for access and delivers a straightforward interface to enhance creativity without unnecessary complexity.
Logo of MultiBooth
MultiBooth
Discover a pioneering two-phase process for generating multi-concept images from text with improved concept fidelity and reduced inference costs. Leveraging a multi-modal encoder and succinct cross-attention mapping, this method excels in efficiency and performance, surpassing benchmarks. Learn about the Stable Diffusion v1.5-based technique for premium image synthesis, with clear explanations of core technical terms for a wider audience.
Logo of gill
gill
The GILL model efficiently generates and retrieves images through interleaved text and image processing. Access the model's code, pretrained weights, and comprehensive setup instructions for inference and training. Utilize Conceptual Captions for model training and extensive evaluation scripts for performance testing. The Gradio demo facilitates practical exploration for researchers and developers interested in multimodal language models.
Logo of MochiDiffusion
MochiDiffusion
This project enables Seamless Native Execution of Stable Diffusion on macOS using Apple's Core ML for optimized performance on Apple Silicon Macs. It provides rapid image generation with low memory consumption, functioning entirely offline. Key functions include image-to-image creation, EXIF metadata insertion, and high-resolution conversion, alongside the adaptability of Core ML model integration. The intuitive SwiftUI interface facilitates easy navigation, with strong privacy adherence by keeping all processing local. This solution is perfect for those in search of sophisticated and efficient image creation on newer Mac systems.
Logo of BentoDiffusion
BentoDiffusion
This guide illustrates the deployment and self-hosting of diffusion models with BentoML, specifically focusing on Stable Diffusion models for generating images and video from text prompts. It provides instructions to set up the SDXL Turbo model with an Nvidia GPU (minimum 12GB VRAM), details dependency installation, and local BentoML service execution. Interaction is possible through Swagger UI or cURL. For scalable solutions, it includes guidance on deploying to BentoCloud. The repository supports various models such as ControlNet, Latent Consistency Model, and Stable Video Diffusion, ensuring efficient deployment for both local and cloud environments.
Logo of async-openai
async-openai
An unofficial Rust library for accessing OpenAI features such as chat, audio, embeddings, and real-time API types asynchronously. The library offers support for Microsoft's Azure OpenAI and uses exponential backoff for reliable request retries. It includes an ergonomic builder pattern for seamless API interaction and supports image generation with customizable sizes.
Logo of LLMGA
LLMGA
Discover the LLMGA project, a multimodal assistant for image generation and editing utilizing Large Language Models. This project enhances prompt accuracy for detailed and interpretable outcomes. It includes a two-phase training process aligning MLLMs with Stable Diffusion models, offering reference-based restoration to harmonize texture and brightness. Suitable for creating interactive designs across various formats, with multilingual support and plugin integration. Learn about its models, datasets, and novel tools supporting both English and Chinese.
Logo of dalle2-in-python
dalle2-in-python
The tutorial offers step-by-step guidance on leveraging DALL·E 2's capabilities using a Python package for AI-driven image generation. Setup involves installing the package, acquiring a bearer token, and utilizing various APIs to generate, download, and customize digital art. It covers key features such as creating images from text prompts, managing multiple image outputs, and using advanced inpainting for image modification. The description ensures easy setup while highlighting the potential of automated art creation with AI technology, showcasing DALL·E's functions effectively.
Logo of visual_anagrams
visual_anagrams
The project delves into the creation of optical illusions using diffusion models, emphasizing visual anagrams that alter appearance through transformations such as rotation and color inversion. Utilizing DeepFloyd's pixel diffusion model, it avoids common artifacts of latent diffusion. The repository includes comprehensive guides and Colab demos for engaging users, offering both free and advanced versions to experiment with high-resolution images. Users can generate diverse illusions in various styles and subjects, employing techniques like flipping and jigsaw. This approach underscores the relationship between art and machine learning, serving as a valuable resource for developers and researchers in synthetic visual art and perception.
Logo of awesome-assistant-api
awesome-assistant-api
Explore the potential of OpenAI Assistant APIs with hands-on demos featuring GPT-4V and Dall-e 3. Access these AI tools for free on Google Colab or a local Jupyter notebook. Discover practical examples including image generation, voice chat, and PPT slide creation using the powerful Assistant API. Benefit from detailed documentation and references, ideal for anyone interested in AI technology and its applications.
Logo of stable-diffusion-nvidia-docker
stable-diffusion-nvidia-docker
The project facilitates Stable Diffusion deployment using Docker, allowing GPU-based image generation without the need for coding skills. Features include a UI built with Gradio, support for the Stable Diffusion 2.0 model, and functionalities like img2img and image inpainting. Its Data Parallel approach enables multi-GPU support, optimizing inference speed for art and design tasks with straightforward installation for Ubuntu and Windows users.
Logo of BrushNet
BrushNet
BrushNet is a diffusion-based image inpainting model that integrates easily with pre-trained systems. Its dual-branch diffusion approach effectively addresses image completion tasks and enhances control by separating masked image features and noisy latent spaces. The model offers adaptable training and inference capabilities, with detailed deployment instructions. Recognized for its innovative design, BrushNet won top prizes at the CVPR2024 GenAI Media Generation Challenge.