onnx-tensorrt - Use TensorRT Backend for Optimized Execution of ONNX Models

Overview of ONNX-TensorRT Project

ONNX-TensorRT is a project that plays a crucial role in leveraging NVIDIA's TensorRT capabilities to execute ONNX models efficiently. Essentially, it acts as a bridge that parses ONNX models for execution using TensorRT, which is a highly optimized deep learning inference engine developed by NVIDIA.

Supported TensorRT Versions

The project is actively developed to support the latest version of TensorRT, specifically version 10.5. This particular version provides advanced features such as dynamic shape support, which allows models to handle inputs of varying sizes and shapes without requiring recompilation. For users who might be working with earlier versions of TensorRT, the project maintains branches that correspond to each of these previous versions.

Supported Operators

ONNX-TensorRT supports a range of ONNX operators, which are the building blocks of models. To understand which specific operators are supported, users can refer to the operator support matrix provided in the project's documentation.

Installation Guide

Dependencies

To use ONNX-TensorRT, a few dependencies need to be met:

Protobuf version 3.0.x or higher, which is a data interchange format.
TensorRT 10.5 and its open-source libraries, which are available via NVIDIA's developer website.

Building the ONNX-TensorRT Library

Building ONNX-TensorRT is facilitated through Docker. Users are encouraged to set up Docker containers according to instructions available in the main TensorRT repository. After cloning the repository, the build process involves configuring and compiling the library in a build directory, ensuring that the system's library path points to the newly compiled resources.

Performance Configuration: InstanceNormalization

The project offers flexibility in performance tuning through different implementations of the InstanceNormalization operator. By default, a native TensorRT implementation is in use, but users interested in experimenting with a different setup can switch to a plugin implementation, which comes with certain compatibility caveats.

Utilizing Tools

ONNX-TensorRT provides tools to help users check compatibility and performance of ONNX models:

C++ Users: Can utilize the trtexec utility, which provides a straightforward command-line interface to convert ONNX models to TensorRT engines.
Python Users: Have access to the polygraphy tool, which offers a similar utility and ease of use in a Python environment.

Python Modules

Python bindings make the ONNX-TensorRT parser accessible in Python. The modules are easily installed using the provided .whl files and allow Python users to seamlessly integrate ONNX with TensorRT.

Running Models with ONNX-TensorRT

Python Example

In Python, using ONNX-TensorRT is straightforward. Users start by loading their ONNX model, preparing it for execution on a CUDA-enabled device, and subsequently running input data through it to obtain results.

C++ Library Usage

For C++ developers, ONNX-TensorRT provides the libnvonnxparser library and its associated header files, allowing for direct integration into C++ projects.

Testing and Validation

After building or installing ONNX-TensorRT, it is crucial to ensure its functionality via tests. The project comes with a set of tests for validating real models and various other scenarios. These tests can be run for either general validation or in more verbose modes for detailed feedback.

Access to Pre-trained Models

For users in need of models to test or deploy, there is a rich repository of pre-trained models available in ONNX format at the ONNX Model Zoo. These serve as excellent resources for exploring ONNX-TensorRT's capabilities.

In summary, ONNX-TensorRT is an advanced toolset that seamlessly merges the flexibility of the ONNX model format with the accelerated performance of TensorRT, making it an invaluable resource for developers aiming to optimize deep learning model deployment.