SSD-Tensorflow - VGG-Based Modular SSD Object Detection Framework in TensorFlow

SSD-TensorFlow: An Overview

SSD-TensorFlow is an implementation framework using TensorFlow to handle object detection tasks. Originally, this concept was presented in a research article and was initially implemented using Caffe, a deep learning framework. This project translates the Caffe methodology into TensorFlow, focusing primarily on SSD networks based on the VGG architecture, using input sizes of 300 and 512.

The project, inspired by the TF-Slim models repository, is organized into three domains:

Datasets: It creates an interface for popular datasets like Pascal VOC and COCO, with utilities to convert them into TF-Records.
Networks: This part is dedicated to defining SSD networks and their encoding and decoding mechanisms.
Pre-processing: It includes routines for pre-processing and data augmentation, taking inspiration from existing VGG and Inception implementations.

Minimal Example of SSD

A minimal working example of the SSD TensorFlow pipeline is available in the SSD Notebook. This pipeline works in two primary steps:

Running the SSD network on an image.
Post-processing the SSD output using known algorithms like top-k filtering and Non-Maximum Suppression.

The notebook also showcases successful detection outputs through illustrative images. Users need to unzip checkpoint files and begin a Jupyter notebook session to execute the example.

Supported Datasets

Currently, the SSD-TensorFlow project supports Pascal VOC datasets from 2007 and 2012. These datasets must be converted into TF-Records for them to be utilized in training. This conversion process ensures efficient data shuffling during training.

Evaluation on Pascal VOC 2007

For evaluating SSD models on Pascal VOC 2007, several pre-trained models are provided with metrics indicating their performance. These models vary in their training data and consequently, their mean Average Precision (mAP) scores. Users can reproduce evaluation metrics by downloading the checkpoints and running specific evaluation commands, which follow trends from Pascal VOC 2007 and 2012 guidelines.

Additionally, users can convert Caffe SSD checkpoints into TensorFlow checkpoints for further experimentation or testing.

Training Networks

Training SSD networks involves a script called train_ssd_network.py. This script is flexible, allowing users to adjust parameters such as datasets, optimizers, and other hyper-parameters to tailor the training process.

Fine-tuning Existing SSD Checkpoints

The project supports the fine-tuning of pre-trained SSD networks. Users can start by employing existing SSD checkpoints, like VGG-300 or VGG-512, to refine a model further. Training scripts offer ample customization, including data augmentation settings and specific network parameters.

Moreover, users can conduct training and evaluation concurrently on a machine with sufficient GPU resources, ensuring continuous performance monitoring of saved checkpoints.

Building New SSD Models from ImageNet

Users interested in developing new SSD models can do so by employing standard architectures like VGG, ResNet, or Inception, upon which multibox layers can be added. By fine-tuning models, experimenters load weights from pre-trained architecture and initialize other components randomly. The script allows gradual training, where initially only the new SSD components are trained, followed by comprehensive network fine-tuning.

The repository and guidelines supplied enable learning how to handle pre-trained weights, offering insightful avenues for constructing and refining object detection models through SSD-TensorFlow.