YOLOv8-TensorRT
YOLOv8-TensorRT is a project designed to enhance the performance of the YOLOv8 model using TensorRT for acceleration. This setup allows for faster and more efficient object detection by leveraging GPU capabilities. Below is a detailed overview of how one can get started with this project, its requirements, and usage instructions.
Setting Up the Environment
To make the most out of YOLOv8-TensorRT, one needs to prepare the necessary environment. Here is a step-by-step guide:
-
Install CUDA:
- Visit the CUDA official website to download and install CUDA. It's recommended to use CUDA version 11.4 or higher for optimal performance.
-
Install TensorRT:
- Similarly, TensorRT can be installed from its official website. Again, version 8.4 or higher is recommended.
-
Python Requirements:
- To set up the Python environment, install the required packages using the command below:
pip install -r requirements.txt
- To set up the Python environment, install the required packages using the command below:
-
Ultralytics Package:
- Install the ultralytics package to facilitate ONNX export or building TensorRT APIs:
pip install ultralytics
- Install the ultralytics package to facilitate ONNX export or building TensorRT APIs:
-
Model Weights:
- Prepare your PyTorch model weights such as
yolov8s.pt
oryolov8s-seg.pt
.
- Prepare your PyTorch model weights such as
Important Note: Always aim to use the latest versions of CUDA and TensorRT to ensure the best performance. If using older versions is necessary, refer to any documentation or support channels for any needed fixes.
Basic Usage
For users wanting to deploy their model:
- If you acquire your ONNX file from the original ultralytics repository, you need to build the engine on your own. Only the C++ inference code is necessary to deserialize the engine and perform inferences. This information is further documented in the Normal.md file.
Working with ONNX Models
Exporting End-to-End ONNX with NMS
Users can export their ONNX models including post-processes like bbox decoding and NMS by using the ultralytics API with the following command:
python3 export-det.py \
--weights yolov8s.pt \
--iou-thres 0.65 \
--conf-thres 0.25 \
--topk 100 \
--opset 11 \
--sim \
--input-shape 1 3 640 640 \
--device cuda:0
Setting Up TensorRT Engines
There are two main ways to build a TensorRT engine from an ONNX model:
-
Using TensorRT ONNX Python API:
python3 build.py \ --weights yolov8s.onnx \ --iou-thres 0.65 \ --conf-thres 0.25 \ --topk 100 \ --fp16 \ --device cuda:0
-
Using Trtexec Tools:
/usr/src/tensorrt/bin/trtexec \ --onnx=yolov8s.onnx \ --saveEngine=yolov8s.engine \ --fp16
Performing Inference
Python Script Inference
The Python script enables image inference via the infer-det.py
script:
python3 infer-det.py \
--engine yolov8s.engine \
--imgs data \
--show \
--out-dir outputs \
--device cuda:0
C++ Inference
For C++ inference, it's important to properly set libraries in CMakeLists.txt
and adjust values in main.cpp
before building the project.
Additional Features
- TensorRT Segment, Pose, Classification, and OBB Deployment: Each of these can be viewed in the respective markdown files.
- DeepStream and Jetson Deployment: Explore methods for deployment on these platforms.
Profiling and Non-PyTorch Inferences
Users can also profile their TensorRT engines or avoid using PyTorch for model inference, instead choosing between cuda-python
or pycuda
for computations, although the performance might not be as optimal.
Conclusion
YOLOv8-TensorRT significantly elevates the capabilities of object detection models through GPU acceleration. With proper setup and understanding, users can achieve extremely fast and efficient model inferences.