clip-interrogator - Generate Strategic Image Prompts Using AI Models

Introduction to CLIP Interrogator

The CLIP Interrogator is an innovative tool designed for those looking to craft the perfect prompts to generate new images similar to an existing one. It acts like a bridge between your creative ideas and image creation, providing nuanced suggestions to inspire art generation using text-to-image models.

How to Run CLIP Interrogator

The CLIP Interrogator is accessible in multiple ways, providing flexible options for users based on their preferences:

Stable Diffusion Web UI Extension: This is a new offering where users can leverage the capabilities of the CLIP Interrogator directly from a user-friendly interface.
Platforms for Execution: Version 2 of the CLIP Interrogator can be run on various platforms such as Colab, HuggingFace, and Replicate. Each platform offers a unique way to interact with the tool:
- Colab: Easily run the tool with a click on "Open In Colab" for a hands-on experience.
- HuggingFace: Access the tool from their Spaces at "Open in Spaces".
- Replicate: Use the "Replicate" platform for streamlined operations.
- Lambda: Another option available via "Lambda".

Moreover, for those interested in comparing different models, Version 1 remains available on Colab.

About the Tool

The foundation of the CLIP Interrogator is built upon OpenAI's CLIP and Salesforce's BLIP. This combination allows the tool to cleverly optimize text prompts by analyzing and understanding the given image, making it ideal for creating striking artwork via models like Stable Diffusion on platforms like DreamStudio.

Using CLIP Interrogator as a Library

For developers and tech enthusiasts, the CLIP Interrogator can be integrated as a library within Python environments. Here's a simple guide:

Setup a Virtual Environment:

python3 -m venv ci_env
(for Linux) source ci_env/bin/activate
(for Windows) .\ci_env\Scripts\activate

Install it Using PIP: Install with GPU support for enhanced performance:

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu117
pip install clip-interrogator==0.5.4

Usage in a Script:

from PIL import Image
from clip_interrogator import Config, Interrogator
image = Image.open(image_path).convert('RGB')
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
print(ci.interrogate(image))

Configuration Options

The CLIP Interrogator is highly configurable with a Config object to tailor its operations:

clip_model_name: Choose which OpenCLIP pretrained model to use.
cache_path: Specify where to save precomputed text embeddings.
download_cache: True to download precomputed embeddings from HuggingFace.
chunk_size: Adjust batch size for systems with varying VRAM.
quiet: Set to True to suppress progress bars or text output.

For systems with limited VRAM, config.apply_low_vram_defaults() can be invoked to minimize memory usage, though this might impact performance slightly.

Advanced Usage

In version 0.6.0, users can rank against custom lists of terms, refining the precision of prompt generation according to personalized criteria:

from clip_interrogator import Config, Interrogator, LabelTable, load_list
from PIL import Image

ci = Interrogator(Config(blip_model_type=None))
image = Image.open(image_path).convert('RGB')
table = LabelTable(load_list('terms.txt'), 'terms', ci)
best_match = table.rank(ci.image_to_features(image), top_count=1)[0]
print(best_match)

With its powerful capabilities, the CLIP Interrogator is a valuable tool for artists, developers, and AI enthusiasts eager to push the boundaries of AI-generated imagery.