YoloV8 TensorRT CPP: A Comprehensive Introduction
Overview
YoloV8 TensorRT CPP is a C++ implementation of the YoloV8 algorithm using TensorRT, designed to enhance the speed and efficiency of deep learning inference tasks. This project supports crucial functionalities such as object detection, semantic segmentation, and body pose estimation. The underlying machinery relies on the TensorRT C++ API to leverage GPU capabilities for running efficient deep learning models. Notably, this project also interconnects with another notable project, tensorrt-cpp-api
, which plays a pivotal role in the inference processes.
Key Features
- Efficient Inference Engine: Built with TensorRT, the project optimizes the inference process, making it faster and more reliable.
- Wide Application Support: The capability to handle object detection, semantic segmentation, and body pose estimation, making it versatile in various use cases.
- Cross-Platform Usage: While primarily tested on Ubuntu 20.04 and 22.04, it harnesses the power of CUDA and OpenCV.
Getting Started
To begin using YoloV8 TensorRT CPP, several prerequisites are necessary:
- Operating System: Compatible with Ubuntu 20.04 and 22.04 (no Windows support at this time).
- CUDA and cuDNN: Essential for leveraging GPU acceleration, CUDA versions 12.0 or higher, and cuDNN 8 or higher are recommended.
- OpenCV: Must be installed with CUDA support; version 4.8 or above is suggested.
- TensorRT: A minimum of version 10.0 is required. Once installed, the path should be configured in the
CMakeLists.txt
.
Installation and Setup
-
Clone the Repository: Use
git clone https://github.com/cyrusbehr/YOLOv8-TensorRT-CPP --recursive
to pull the repository. The--recursive
flag is crucial for pulling necessary submodules. -
Model Conversion: Convert PyTorch models to the ONNX format by navigating to the official YoloV8 repository to download the desired model, and then using the provided script to perform the conversion.
-
Build the Project: Create a build directory and compile the project using CMake:
mkdir build cd build cmake .. make -j
Running the Program
Executing the project scripts for the first time involves TensorRT building an optimized engine file, a process that might take several minutes. Post the initial run, subsequent executions use the stored engine file for faster performance.
- Benchmarking: Execute the benchmarking script to test performance metrics.
- Inference Tasks: Run inference on images, videos, or in real-time using a webcam with the provided executables, such as
detect_object_image
for static images.
Advanced Features: INT8 Inference
To maximize efficiency, INT8 precision can be used, providing a balance between speed and accuracy. This requires calibration data and can be enabled with specific command-line arguments. You may need to adjust the batch size for optimal GPU memory usage.
Benchmarking Performance
Benchmarking has been conducted on various models and precisions, with substantial improvements in inference times. Notably, FP16 precision offers a significant speed advantage over FP32, with INT8 precision further optimizing performance but with a slight trade-off in accuracy.
Debugging and Further Development
Debugging and tracing any issues can be enhanced by adjusting the log level in engine.cpp
for more verbose output. Ongoing improvements are being pursued, particularly focusing on refining post-processing time using CUDA kernels.
Acknowledgment
The success of the project is attributed to a robust community of contributors, continually driving enhancements and maintenance. Contributions and feedback are invaluable, with community support encouraged by starring the repository on GitHub.
Conclusion
YoloV8 TensorRT CPP exemplifies the synergy between cutting-edge technology and practical application, fostering advancements in real-time machine learning inference. Its development ecosystem and meticulous documentation make it both a powerful and accessible tool for developers seeking to leverage the full potential of YoloV8 through TensorRT and C++.