sc_depth_pl - Comprehensive Self-supervised Monocular Depth Learning Approaches for Video Analysis

SC_Depth Project Overview

The SC_Depth project focuses on self-supervised learning of monocular depth from video using PyTorch Lightning. It consists of three versions: SC-DepthV1, SC-DepthV2, and SC-DepthV3, each building upon the last to enhance accuracy and performance in depth estimation from video footage.

SC-DepthV1

SC-DepthV1 is aimed at predicting depth from video in a scale-consistent manner over time. The key innovations in this version include:

Geometry Consistency Loss: This ensures that the depth predictions remain consistent in scale over multiple frames, an essential factor for applications like ORB-SLAM2, where scale is critical.
Self-discovered Mask: This technique detects and excludes dynamic regions and occlusions during training to boost accuracy.

Results are demonstrated in a video showcasing depth estimation as both a point cloud and a color map, indicating successful implementation of the above techniques.

SC-DepthV2

SC-DepthV2 addresses the challenges of unsupervised monocular depth estimation in indoor scenes due to large rotational motions captured by hand-held cameras. The significant contribution in this version includes:

Auto-Rectify Network (ARN): This network mitigates the issues caused by the large rotational shifts between consecutive video frames. By integrating ARN into SC-DepthV1 and training it with self-supervised losses, performance improvements are achieved.

This version emphasizes refining the accuracy of depth estimation especially in indoor environments, where movement can hinder depth perception.

SC-DepthV3

SC-DepthV3 presents advancements for accurate depth estimation in dynamic scenes, characterized by considerable motion and variations. Main improvements in this version include:

Robust Learning Framework: This framework addresses shortcomings of previous self-supervised methods which perform poorly with dynamic objects and occlusions.
External Depth Estimation Network: A pretrained external network provides single-image depth priors, which are used to guide the depth learning process more effectively.

Evaluation across various challenging datasets (static and dynamic) validates the effectiveness of SC-DepthV3 in predicting sharp and accurate depth maps.

Installation and Usage

The project provides comprehensive instructions on how to set up the necessary environment and datasets for training and testing models. This includes steps to create a Python environment, install dependencies, and organize datasets with specific structure requirements.

Training and Testing

The repository includes scripts for training models on various datasets like KITTI and NYU, and evaluating accuracy on full images. Pseudocode and commands are also available for training on custom data, along with guidelines for generating pseudo-depth data using advanced models like LeReS or DPT.

Pretrained Models

Pretrained models are accessible for versions V1, V2, and V3, trained on various datasets. Users can download and utilize these models for inference or continue their training.

Evaluation and Demos

Additional scripts are provided for evaluating model performance on dynamic versus static regions and running demos to visualize results on new data inputs. These resources aid developers in assessing the robustness and accuracy of the depth estimations in different scenarios.

References and Research Papers

The project is supported by extensive academic research, with published papers available for each version, illustrating the scientific underpinnings and contributions of SC_Depth to the field of computer vision.

SC_Depth demonstrates significant advancements in monocular depth estimation from video, achieving milestones in accuracy and applicability in dynamic environments.