head-pose-estimation - Real-time Human Head Pose Estimation Using ONNX and OpenCV

Head Pose Estimation

The head pose estimation project is designed to provide real-time tracking and assessment of a human's head position using ONNX Runtime and OpenCV. This innovative application is capable of detecting and evaluating the orientation of a person's head in a video or webcam feed.

How it Works

The process of head pose estimation is divided into three main steps:

Face Detection: The first step involves using a face detector to identify and isolate a bounding box around the human face within an image. Once detected, this box is adjusted into a square format to optimize it for the subsequent processes.
Facial Landmark Detection: Next, a pre-trained deep learning model takes the squared image of the face and analyzes it to identify 68 key facial landmarks. These landmarks are points on the face like the corners of the eyes, tip of the nose, and outline of the jaw, which are essential for understanding facial structure.
Pose Estimation: Finally, using the array of 68 facial landmarks, the head's pose is computed through a mutual Perspective-n-Point (PnP) algorithm, providing detailed insight into the orientation of the head.

Getting Started

Below are the steps to get the project operational on your local machine for development and testing:

Prerequisites

To successfully run this project, ensure you have an environment similar to where this project was tested, specifically:

Ubuntu 22.04 as the operating system.
ONNX Runtime 1.17.1 and OpenCV 4.5.4 as the required frameworks.

Installing

Begin by cloning the project repository:

git clone https://github.com/yinguobing/head-pose-estimation.git

Install necessary dependencies using pip:

pip install -r requirements.txt

The pre-trained models are stored in the assets directory and can be downloaded using Git Large File Storage (LFS):

git lfs pull

Alternatively, they can be downloaded manually from the release page.

Running

To execute the program, you need to provide either a video file or a webcam index as input. If no input is specified, the default setting will use the system's built-in webcam.

Using a Video File

For video files supported by OpenCV, like mp4 or avi, use the following command:

python3 main.py --video /path/to/video.mp4

Using a Webcam

To access and use a webcam, provide its index. For instance, use the following command for the primary webcam:

python3 main.py --cam 0

Retrain the Model

For users interested in retraining the model, tutorials are available at: Tutorials. The training code is accessible at: GitHub.

Note: PyTorch version for retraining will be available soon.

Licensing

This project operates under the MIT License. Full license details can be found in the project's LICENSE file.

Additional Information

The face detection component is implemented using the SCRFD model from InsightFace.
The pre-trained model sources several public datasets, each with its own licensing terms, which should be reviewed separately.

Acknowledgments

The project thanks datasets like 300-W, 300-VW, LFPW, HELEN, AFW, and IBUG as foundational elements for training. For 3D face modeling, OpenFace's resources are used. The face detector originates from SCRFD by InsightFace.

This project is a creation by Yin Guobing, offering insightful contributions in the area of real-time pose estimation technology.