VoTT - Enhance Image and Video Annotation with Open-Source Tool

VoTT (Visual Object Tagging Tool)

Overview

The Visual Object Tagging Tool, commonly known as VoTT, is an open-source tool designed to assist users in annotating and labeling both images and video assets. Developed as a React and Redux-based web application, VoTT is authored in TypeScript and was originally initiated through Create React App. It provides a seamless way to facilitate machine learning workflows by allowing users to label and manage their data effectively.

Key Features

VoTT offers a variety of features that make it an essential tool for anyone working with large sets of image or video data for machine learning purposes. These include:

Labeling capabilities for individual images or video frames.
Flexibility to import and export data to and from local and cloud storage providers, such as Azure Blob Storage and local file systems.

Facilitation of Machine Learning

One of the main advantages of using VoTT is its ability to streamline the machine learning pipeline. By efficiently labeling data, it enables users to prepare their datasets for training machine learning models, making the process more organized and less time-consuming.

Getting Started with VoTT

VoTT can be employed as either a native application or a web application, providing users with versatile options. Here are some highlights of getting started:

Download and Installation: Users can download platform-specific installer packages for Windows, Linux, or OSX from GitHub Releases.
Running from Source: Users who prefer to run VoTT from source can set it up using Node.js and npm.
Web Application: VoTT is also available as a standalone web application that runs on modern web browsers.

Transition from V1 to V2

VoTT V2 represents a major upgrade and refactor of the original version. This version adopts more modern development practices and technologies, enhancing its extensibility and maintainability. Users transitioning from V1 can easily convert their projects to the V2 format, ensuring compatibility and paving the way for future updates.

Using VoTT

Creating Connections

Connections in VoTT help manage the sources of assets to be labeled and the destination for exporting labeled data. These connections are essential for efficiently organizing and utilizing the application across multiple projects.

Setting Up Projects

Projects in VoTT encapsulate configurations, connections to data sources and destinations, and metadata like tagging schemas. Setting up projects properly ensures that workflows for labeling assets are streamlined and effective.

Labeling Process

VoTT provides robust functionalities for labeling both images and videos. Users can:

Draw and tag regions within images.
Work with videos using intuitive frame navigation and tagging tools.

Exporting Labeled Data

Once the assets are labeled, data can be exported in various formats for use in machine learning platforms, such as Azure Custom Vision Service, Microsoft Cognitive Toolkit, and TensorFlow formats. This flexibility allows users to choose the format that best fits their project requirements.

Optimizing Efficiency with Shortcuts

VoTT includes keyboard shortcuts and mouse controls to enhance the efficiency of labeling tasks. Users can quickly apply and manage tags, select tools, and navigate through assets using simple commands.

VoTT Development and Contribution

Initial development was carried out by the Commercial Software Engineering group at Microsoft in Israel, with further contributions from their team in Redmond, Washington. VoTT is an open-source project, and there are numerous ways for the community to contribute, ensuring its continuous improvement and adaptation to emerging needs.

VoTT adheres to the Microsoft Open Source Code of Conduct, fostering a cooperative community of developers and users passionate about improving machine learning tools.