#Text-to-Image

Logo of DALLE-pytorch
DALLE-pytorch
This project offers an implementation of OpenAI's DALL-E in Pytorch, providing text-to-image transformation capabilities with options for scalability and customization, including the use of pretrained VAE models and adjustable attention mechanisms. It includes CLIP integration for image generation ranking and supports training protocols like reversible networks and sparse attention.
Logo of BingGPT-Discord-Bot
BingGPT-Discord-Bot
This article objectively presents a Discord bot allowing interaction with Microsoft's Bing Chat using GPT-4. It includes a comprehensive setup guide that involves Python 3.12+, a Microsoft account, and optionally Docker. Enhance Discord server interaction using features like text-to-image generation with commands such as /ask and /imagine, noting the /imagine command currently faces issues. The bot integrates effortlessly and securely, detailing environment variable configurations, browser extension usage for access checks, and installation options through Docker or Python.
Logo of gill
gill
The GILL model efficiently generates and retrieves images through interleaved text and image processing. Access the model's code, pretrained weights, and comprehensive setup instructions for inference and training. Utilize Conceptual Captions for model training and extensive evaluation scripts for performance testing. The Gradio demo facilitates practical exploration for researchers and developers interested in multimodal language models.
Logo of CVPR2024-Papers-with-Code-Demo
CVPR2024-Papers-with-Code-Demo
The platform features a regularly updated selection of CVPR 2024 research papers and open-source code, serving as an important resource for computer vision professionals. Covering topics from image classification and object detection to advanced technologies like diffusion models and NeRF, it supports staying informed about the latest innovations. Community engagement through issue submissions and discussions is encouraged to promote collective progress in the field.
Logo of Lumina-T2X
Lumina-T2X
Lumina-T2X utilizes flow-based diffusion transformers to effectively convert text into various modalities, including images, videos, and music. It supports high-quality outputs with resolutions up to 2K, and accommodates multilingual prompts and emojis. Recent enhancements improve visual quality, offering new demos that highlight its versatility in vision-language tasks, targeting developers and researchers engaged in generative AI.
Logo of lora
lora
Learn how Low-rank Adaptation speeds up the fine-tuning of Stable Diffusion models, enhancing efficiency and reducing model size, ideal for easy sharing. This technique is compatible with diffusers, includes inpainting support, and can surpass traditional fine-tuning in performance. Discover integrated pipelines for enhancing CLIP, Unet, and token outputs, along with straightforward checkpoint merging. Delve into project updates, its web demo on Huggingface Spaces, and explore detailed features to understand its role in text-to-image diffusion fine-tuning.
Logo of LLM-groundedDiffusion
LLM-groundedDiffusion
Explore how LLMs enhance text-to-image diffusion by refining prompt understanding and improving image generation. This project effectively integrates into diffusers v0.24.0 and features a self-hosted model comparable to GPT-3.5, offering modularity and potential for AI research advancements.
Logo of MultiBooth
MultiBooth
Discover a pioneering two-phase process for generating multi-concept images from text with improved concept fidelity and reduced inference costs. Leveraging a multi-modal encoder and succinct cross-attention mapping, this method excels in efficiency and performance, surpassing benchmarks. Learn about the Stable Diffusion v1.5-based technique for premium image synthesis, with clear explanations of core technical terms for a wider audience.
Logo of CustomNet
CustomNet
Explore CustomNet: an advanced framework integrating 3D view synthesis in text-to-image models for precise object customization. With no need for test-time optimization, it preserves object identity while varying viewpoints and backgrounds, leveraging a unique dataset pipeline for real-world complexities.
Logo of flux
flux
The Flux project by Black Forest Labs enables cutting-edge image transformation through its latent rectified flow transformers. It supports both text-to-image and image-to-image features. The project highlights aspects like easy local setup and API connectivity, collaborating with platforms like Replicate, FAL, Mystic, and Together for broader access. Users can try models like 'FLUX.1 [pro]' and 'FLUX1.1 [pro]' via API, and obtain others such as 'FLUX.1 [schnell]' on HuggingFace. Demonstrations using Streamlit and Gradio offer interactive experiences, while integration with the diffusers library ensures efficient utilization of resources. Detailed API documentation is available on docs.bfl.ml.
Logo of CogView
CogView
CogView uses a 4 billion parameter transformer model for general text-to-image generation. It includes code releases and demos, with PB-relax and Sandwich-LN techniques for stable transformer training. While supporting multiple languages, CogView primarily uses Chinese text input with recommended English translations. It offers pretrained models, inference, and super-resolution features, along with detailed setup instructions for various environments, suitable for complex AI tasks, including both single and multi-node training.