YOLOv8-TensorRT

YOLOv8-TensorRT is a project designed to enhance the performance of the YOLOv8 model using TensorRT for acceleration. This setup allows for faster and more efficient object detection by leveraging GPU capabilities. Below is a detailed overview of how one can get started with this project, its requirements, and usage instructions.

Setting Up the Environment

To make the most out of YOLOv8-TensorRT, one needs to prepare the necessary environment. Here is a step-by-step guide:

Install CUDA:
- Visit the CUDA official website to download and install CUDA. It's recommended to use CUDA version 11.4 or higher for optimal performance.
Install TensorRT:
- Similarly, TensorRT can be installed from its official website. Again, version 8.4 or higher is recommended.
Python Requirements:
- To set up the Python environment, install the required packages using the command below:
```
pip install -r requirements.txt
```
Ultralytics Package:
- Install the ultralytics package to facilitate ONNX export or building TensorRT APIs:
```
pip install ultralytics
```
Model Weights:
- Prepare your PyTorch model weights such as yolov8s.pt or yolov8s-seg.pt.

Important Note: Always aim to use the latest versions of CUDA and TensorRT to ensure the best performance. If using older versions is necessary, refer to any documentation or support channels for any needed fixes.

Basic Usage

For users wanting to deploy their model:

If you acquire your ONNX file from the original ultralytics repository, you need to build the engine on your own. Only the C++ inference code is necessary to deserialize the engine and perform inferences. This information is further documented in the Normal.md file.

Working with ONNX Models

Exporting End-to-End ONNX with NMS

Users can export their ONNX models including post-processes like bbox decoding and NMS by using the ultralytics API with the following command:

python3 export-det.py \
--weights yolov8s.pt \
--iou-thres 0.65 \
--conf-thres 0.25 \
--topk 100 \
--opset 11 \
--sim \
--input-shape 1 3 640 640 \
--device cuda:0

Setting Up TensorRT Engines

There are two main ways to build a TensorRT engine from an ONNX model:

Using TensorRT ONNX Python API:

python3 build.py \
--weights yolov8s.onnx \
--iou-thres 0.65 \
--conf-thres 0.25 \
--topk 100 \
--fp16  \
--device cuda:0

Using Trtexec Tools:

/usr/src/tensorrt/bin/trtexec \
--onnx=yolov8s.onnx \
--saveEngine=yolov8s.engine \
--fp16

Performing Inference

Python Script Inference

The Python script enables image inference via the infer-det.py script:

python3 infer-det.py \
--engine yolov8s.engine \
--imgs data \
--show \
--out-dir outputs \
--device cuda:0

C++ Inference

For C++ inference, it's important to properly set libraries in CMakeLists.txt and adjust values in main.cpp before building the project.

Additional Features

TensorRT Segment, Pose, Classification, and OBB Deployment: Each of these can be viewed in the respective markdown files.
DeepStream and Jetson Deployment: Explore methods for deployment on these platforms.

Profiling and Non-PyTorch Inferences

Users can also profile their TensorRT engines or avoid using PyTorch for model inference, instead choosing between cuda-python or pycuda for computations, although the performance might not be as optimal.

Conclusion

YOLOv8-TensorRT significantly elevates the capabilities of object detection models through GPU acceleration. With proper setup and understanding, users can achieve extremely fast and efficient model inferences.