VisionScript: A Simplified Approach to Computer Vision
VisionScript is an innovative programming language designed to simplify and speed up common computer vision tasks. Built on Python, VisionScript offers an intuitive syntax that makes it easy to execute tasks like object detection, classification, and segmentation.
Overview
VisionScript was created to provide a straightforward approach to handling one-off computer vision tasks. It isn't intended to replace comprehensive programming languages for complex vision tasks but rather to offer an easy entry point for beginners interested in exploring image classification and segmentation.
Getting Started with VisionScript
Installation
To start using VisionScript, simply install it via pip:
pip install visionscript
After installation, you can launch the VisionScript REPL by typing:
visionscript
This opens an interactive session where you can input commands.
Running Scripts
VisionScript scripts, saved in .vic
files, can be run using:
visionscript ./your_file.vic
Using VisionScript in Notebooks
VisionScript also supports an interactive web notebook interface. To access it, run:
visionscript --notebook
This opens a temporary notebook in your browser, where you can experiment with VisionScript code.
Quickstart Guide
VisionScript allows you to accomplish tasks with minimal lines of code. Here's a quick overview of what you can do:
-
Object Detection: Find people in an image or across multiple images in a directory by simply using:
Load["./photo.jpg"] Detect["person"] Say[]
-
Replace Objects in an Image: Substitute detected objects with an emoji as shown here:
Load["./abbey.jpg"] Size[] Say[] Detect["person"] Replace["emoji.png"] Save["./abbey2.jpg"]
-
Image Classification: Classify images into categories like 'apple' or 'banana':
Load["./photo.jpg"] Classify["apple", "banana"]
Key Features and Inspirations
VisionScript's syntax is influenced by Python and the Wolfram Language, using a simple, line-by-line execution method. A unique aspect of VisionScript is lexical inference, which eliminates the need to explicitly declare variables. For example:
Load["./photo.jpg"]
Size[]
Say[]
In the above, Size[]
and Say[]
operate on the most recently loaded image without requiring additional parameters.
Supported Models
VisionScript provides simple interfaces to powerful models such as:
- CLIP by OpenAI for classification
- Ultralytics YOLOv8 for object detection and segmentation
- FastSAM and GroundedSAM for segmentation
- BLIP for caption generation
- ViT for training classification models
Development and Contribution
VisionScript welcomes contributions to enhance its features or fix bugs. Developers can clone the repository, set up their environment, and run VisionScript locally using:
git clone https://github.com/capjamesg/VisionScript
Licensing
VisionScript is open-source and available under the MIT license.
For more detailed documentation and examples, you can visit VisionScript's website. Whether you are a beginner curious about computer vision or a developer looking for a quick solution for simple tasks, VisionScript offers a user-friendly platform to explore and implement vision technologies.