awesome-openai-vision-api-experiments - Creative Experiments with OpenAI Vision API

Overview of the "Awesome OpenAI Vision API Experiments" Project

The "Awesome OpenAI Vision API Experiments" project is a treasure trove for anyone interested in exploring the capabilities of the OpenAI Vision API. This collaborative repository functions as a central hub for diverse experiments, ranging from straightforward image classifications to sophisticated zero-shot learning models. It is designed to accommodate both newcomers and seasoned experts who wish to delve into the world of visual AI, allowing them to share insights and advance the technology further.

For those interested in conducting experiments, obtaining an API key is crucial. You can acquire one through the OpenAI platform.

Limitations

While the Vision API is powerful, it does come with certain limitations, such as:

A cap of 100 API requests per API key per day.
Inability to perform object detection or image segmentation natively. However, these limitations can be addressed by integrating GPT-4V technology with foundational models like GroundingDINO or SAM (Segment Anything Model). An example showcasing this integration is available here, along with further insights in a dedicated blog post.

Experiments

The project houses various experiments, each with its unique focus and contributions from different authors. Some notable experiments include:

WebcamGPT: An innovative approach to interacting with video streams. More details and resources are available on GitHub and Hugging Face.
HotDogGPT: A basic application for image classification designed to determine whether an image contains a hot dog. Resources are accessible on GitHub and Hugging Face.
Zero-Shot Image Classifier with GPT-4V: Developed by @capjamesg, this experiment focuses on classification without needing prior examples. Check out the details on GitHub.
Zero-Shot Object Detection with GroundingDINO + GPT-4V: By combining GroundingDINO and GPT-4V, this experiment explores object detection capabilities. Find more information on GitHub and Hugging Face.
Automated Voiceover for NBA Games: Another creative application developed by @SkalskiP, with resources on GitHub and a demonstration on Google Colab.

Must-Read Papers and Blogs

For those eager to deepen their understanding, several papers and blog posts offer further insights into the experiments and technologies at play. Notable papers include "Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V" and "The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)." Additionally, blogs such as "How CLIP and GPT-4V Compare for Classification" and "DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model" provide practical insights and findings.

Contribution

The project welcomes contributions from enthusiasts looking to enhance the repository's offerings. Whether one wishes to add new experiments or suggest improvements, the team encourages active participation through issues and pull requests. An extensive contribution guide is available to guide new contributors through the process.

This project is a beacon for innovation in visual AI, inviting everyone to experiment, learn, and contribute to the development of groundbreaking AI technologies.