#text-to-image

Logo of DALLE2-pytorch
DALLE2-pytorch
The project provides a Pytorch implementation of OpenAI's DALL-E 2 that advances text-to-image synthesis through diffusion networks. It focuses on a prior network for predicting image embeddings, enhancing generation accuracy and diversity. This repository supports AI researchers and developers in model replication and training, in collaboration with the LAION community. It integrates neural networks like CLIP and diffusion priors to generate high-quality images from text. Discover the innovative use of pixel shuffle upsamplers and cascading DDPMs for image generation and join the Discord community for contributions and pre-trained models on Hugging Face.
Logo of Attend-and-Excite
Attend-and-Excite
Attend-and-Excite improves the accuracy of text-to-image models using attention-based guidance, addressing issues with subject and attribute representation. By refining cross-attention, this method ensures images reflect the text prompts accurately. Generative Semantic Nursing (GSN) allows real-time adjustments, enhancing precision and reliability of results from diverse inputs.
Logo of IP-Adapter
IP-Adapter
IP-Adapter efficiently integrates image prompts into text-to-image diffusion models with 22M parameters, matching or surpassing fine-tuned models' performance. This tool adapts to custom models and enhances controllable generation with existing tools, supporting multimodal generation with text prompts. Recent versions include face recognition and improved fidelity with SDXL, available for exploration through demos and third-party integration, highlighting its versatility in AI image generation.
Logo of Wuerstchen
Wuerstchen
Würstchen offers an innovative method for training text-conditional models, using a highly compressed latent space to achieve 42x compression. Detailed in the ICLR 2024 paper, this architecture employs multi-stage compression for fast and cost-effective text-to-image generation. Integrated with the diffusers library, Würstchen is easily accessible for implementation and testing through notebooks and scripts, providing a robust solution for researchers and developers involved in large-scale text-to-image diffusion models.
Logo of MultiDiffusion
MultiDiffusion
MultiDiffusion harnesses a pre-trained text-to-image diffusion model for controlled image generation without additional training. Its novel optimization synchronizes several diffusion processes under shared parameters, enabling high-quality images according to user specifications like aspect ratio and spatial guidance. Integrated with the Diffusers library, it addresses existing challenges, allowing rapid adaptation to tasks without lengthy re-training. Accessible demonstrations are available via Gradio and Hugging Face.
Logo of dalle-playground
dalle-playground
Discover the capabilities of text-to-image technology with this playground featuring Stable Diffusion V2. The interface is updated for ease of use and replaces DALL-E Mini, offering powerful image generation. Ideal for tech enthusiasts, it integrates smoothly with Google Colab for quick setups and supports local development in diverse environments such as Windows WSL2 and Docker-compose. Enjoy efficient creation of stunning visuals with a straightforward setup process, catering to developers and creatives interested in advanced AI solutions.
Logo of MM-Interleaved
MM-Interleaved
The MM-Interleaved model is a pioneer in interleaved image-text generative modeling, featuring a multi-modal feature synchronizer for high-resolution recognition. It supports tasks like visual storytelling, visual question answering, and text-to-image generation. With zero-shot and finetuning capabilities, it offers excellent performance across multiple benchmarks. Access pretrained models for versatile application adaptation.
Logo of Kolors
Kolors
Kolors improves text-to-image synthesis using advanced diffusion models, ensuring high visual quality and semantic accuracy in both English and Chinese. Leveraging billions of text-image pairs, it is proficient in detailed and complex designs. Recent updates enable features like virtual try-ons, pose control, and face identification, accessible via Hugging Face and GitHub. Its performance is validated by comprehensive evaluations. The Kolors suite includes user-friendly pipelines for diffusion models, inpainting, and LoRA training, offering a robust solution for photorealistic image generation.
Logo of stable-diffusion-2-gui
stable-diffusion-2-gui
Discover the image generation features of Stable Diffusion 2.1 through an accessible web interface. This Gradio-based application utilizes Hugging Face Diffusers to support text-to-image, image-to-image, inpainting, upscaling, and depth-to-image workflows. Engage with the project community on Discord for further insights and assistance.
Logo of InstructCV
InstructCV
InstructCV utilizes advancements in text-to-image diffusion to streamline computer vision tasks, such as segmentation and classification. It simplifies execution through a natural language interface, transforming tasks into text-to-image problems. Using diverse datasets, it employs instruction-tuning to enhance task performance, serving as an instruction-guided vision learner.
Logo of imagen-pytorch
imagen-pytorch
Explore Google's Imagen project for efficient text-to-image generation in Pytorch, featuring simplified architecture and tools from Huggingface. Key features include dynamic clipping, noise level conditioning, and multi-GPU support.
Logo of InstanceDiffusion
InstanceDiffusion
InstanceDiffusion offers precise, instance-level control for text-to-image diffusion models, significantly improving image generation quality with enhanced metrics like 2.0 times higher AP50 for box inputs. It supports varied location inputs, such as points and masks. Recent updates include integration with ComfyUI and support for flash attention, reducing memory usage. The model is thoroughly evaluated for advanced tasks on datasets like MSCOCO, making it suitable for research and academic exploration.
Logo of TokenFlow
TokenFlow
TokenFlow achieves high-quality, text-consistent video editing using diffusion models without additional training. By propagating diffusion features through inter-frame correspondences, it ensures both spatial and dynamic consistency. The framework supports both localized and global edits, enabling semi-transparent effects like smoke and fire. Compatible with existing text-to-image editing methods, TokenFlow delivers state-of-the-art results across various real-world videos.
Logo of custom-diffusion
custom-diffusion
Learn how Custom Diffusion enables efficient fine-tuning of text-to-image models like Stable Diffusion. This approach introduces new concepts into models by adjusting key parameters, resulting in unique, multi-concept images with minimal storage impact. Access newly released datasets and utilize swift processing capabilities, now available in diffusers for improved training and inference.
Logo of Rodel.Agent
Rodel.Agent
Discover a feature-rich Windows application that integrates chat, text-to-image, text-to-speech, and translation features. Supporting popular AI services, the tool ensures a superior desktop AI experience. Requiring Visual Studio 2022 and .NET 8, it allows for custom configurations with modular console programs. Comprehensive documentation guides users through setup and development, enabling full utilization of its diverse AI functionalities.
Logo of Lumina-mGPT
Lumina-mGPT
Lumina-mGPT is a suite of cutting-edge multimodal autoregressive models specializing in text-to-image generation with precision and adaptability. Equipped with extensive training resources, Lumina-mGPT supports complex multimodal tasks through the xllmx module. Its functionality is demonstrated via local Gradio demos on image creation and interpretation. This open-source project serves as a resource for both research and practical applications, accommodating expanding model configurations within the AI field.
Logo of VisCPM
VisCPM
VisCPM represents a versatile series of open-source bilingual multimodal models, proficient in dialogue ('VisCPM-Chat') and image generation ('VisCPM-Paint'). Utilizing the robust CPM-Bee model and integrating advanced visual encoders and decoders, it excels in bilingual processing with superior Chinese language capabilities. Ideal for research applications, VisCPM continuously evolves with features such as low-resource inference and web deployment, facilitating widespread utilization.
Logo of diffusiondb
diffusiondb
Discover DiffusionDB, a comprehensive dataset of 14 million images generated via Stable Diffusion using real user prompts. Ideal for research on prompt interactions, deepfake detection, and AI tool development, with subsets catering to different storage needs. Effortlessly access images and metadata online using various loading methods.