visionscript - Streamline Vision Tasks Using Python-Based VisionScript

VisionScript: A Simplified Approach to Computer Vision

VisionScript is an innovative programming language designed to simplify and speed up common computer vision tasks. Built on Python, VisionScript offers an intuitive syntax that makes it easy to execute tasks like object detection, classification, and segmentation.

Overview

VisionScript was created to provide a straightforward approach to handling one-off computer vision tasks. It isn't intended to replace comprehensive programming languages for complex vision tasks but rather to offer an easy entry point for beginners interested in exploring image classification and segmentation.

Getting Started with VisionScript

Installation

To start using VisionScript, simply install it via pip:

pip install visionscript

After installation, you can launch the VisionScript REPL by typing:

visionscript

This opens an interactive session where you can input commands.

Running Scripts

VisionScript scripts, saved in .vic files, can be run using:

visionscript ./your_file.vic

Using VisionScript in Notebooks

VisionScript also supports an interactive web notebook interface. To access it, run:

visionscript --notebook

This opens a temporary notebook in your browser, where you can experiment with VisionScript code.

Quickstart Guide

VisionScript allows you to accomplish tasks with minimal lines of code. Here's a quick overview of what you can do:

Object Detection: Find people in an image or across multiple images in a directory by simply using:
```
Load["./photo.jpg"]
Detect["person"]
Say[]
```

Replace Objects in an Image: Substitute detected objects with an emoji as shown here:

Load["./abbey.jpg"]
Size[]
Say[]
Detect["person"]
Replace["emoji.png"]
Save["./abbey2.jpg"]

Image Classification: Classify images into categories like 'apple' or 'banana':
```
Load["./photo.jpg"]
Classify["apple", "banana"]
```

Key Features and Inspirations

VisionScript's syntax is influenced by Python and the Wolfram Language, using a simple, line-by-line execution method. A unique aspect of VisionScript is lexical inference, which eliminates the need to explicitly declare variables. For example:

Load["./photo.jpg"]
Size[]
Say[]

In the above, Size[] and Say[] operate on the most recently loaded image without requiring additional parameters.

Supported Models

VisionScript provides simple interfaces to powerful models such as:

CLIP by OpenAI for classification
Ultralytics YOLOv8 for object detection and segmentation
FastSAM and GroundedSAM for segmentation
BLIP for caption generation
ViT for training classification models

Development and Contribution

VisionScript welcomes contributions to enhance its features or fix bugs. Developers can clone the repository, set up their environment, and run VisionScript locally using:

git clone https://github.com/capjamesg/VisionScript

Licensing

VisionScript is open-source and available under the MIT license.

For more detailed documentation and examples, you can visit VisionScript's website. Whether you are a beginner curious about computer vision or a developer looking for a quick solution for simple tasks, VisionScript offers a user-friendly platform to explore and implement vision technologies.