Head Pose Estimation
The head pose estimation project is designed to provide real-time tracking and assessment of a human's head position using ONNX Runtime and OpenCV. This innovative application is capable of detecting and evaluating the orientation of a person's head in a video or webcam feed.
How it Works
The process of head pose estimation is divided into three main steps:
-
Face Detection: The first step involves using a face detector to identify and isolate a bounding box around the human face within an image. Once detected, this box is adjusted into a square format to optimize it for the subsequent processes.
-
Facial Landmark Detection: Next, a pre-trained deep learning model takes the squared image of the face and analyzes it to identify 68 key facial landmarks. These landmarks are points on the face like the corners of the eyes, tip of the nose, and outline of the jaw, which are essential for understanding facial structure.
-
Pose Estimation: Finally, using the array of 68 facial landmarks, the head's pose is computed through a mutual Perspective-n-Point (PnP) algorithm, providing detailed insight into the orientation of the head.
Getting Started
Below are the steps to get the project operational on your local machine for development and testing:
Prerequisites
To successfully run this project, ensure you have an environment similar to where this project was tested, specifically:
- Ubuntu 22.04 as the operating system.
- ONNX Runtime 1.17.1 and OpenCV 4.5.4 as the required frameworks.
Installing
Begin by cloning the project repository:
git clone https://github.com/yinguobing/head-pose-estimation.git
Install necessary dependencies using pip:
pip install -r requirements.txt
The pre-trained models are stored in the assets
directory and can be downloaded using Git Large File Storage (LFS):
git lfs pull
Alternatively, they can be downloaded manually from the release page.
Running
To execute the program, you need to provide either a video file or a webcam index as input. If no input is specified, the default setting will use the system's built-in webcam.
Using a Video File
For video files supported by OpenCV, like mp4
or avi
, use the following command:
python3 main.py --video /path/to/video.mp4
Using a Webcam
To access and use a webcam, provide its index. For instance, use the following command for the primary webcam:
python3 main.py --cam 0
Retrain the Model
For users interested in retraining the model, tutorials are available at: Tutorials. The training code is accessible at: GitHub.
Note: PyTorch version for retraining will be available soon.
Licensing
This project operates under the MIT License. Full license details can be found in the project's LICENSE file.
Additional Information
- The face detection component is implemented using the SCRFD model from InsightFace.
- The pre-trained model sources several public datasets, each with its own licensing terms, which should be reviewed separately.
Acknowledgments
The project thanks datasets like 300-W, 300-VW, LFPW, HELEN, AFW, and IBUG as foundational elements for training. For 3D face modeling, OpenFace's resources are used. The face detector originates from SCRFD by InsightFace.
This project is a creation by Yin Guobing, offering insightful contributions in the area of real-time pose estimation technology.