ortex - Simplified Deployment of ONNX Models Using Ortex with Multiple Backend Options

Introduction to Ortex

Ortex is a powerful tool designed to work with ONNX models, which are widely used in the field of machine learning. It acts as a bridge to the ONNX Runtime and offers easy deployment of ONNX models, allowing them to run efficiently and concurrently in a variety of settings, including distributed clusters. Ortex simplifies the management and implementation of models by providing both a streamlined deployment system and a storage-only tensor for easier manipulation of data.

Key Features

ONNX Model-Compatible: Ortex seamlessly loads and facilitates fast inference of ONNX models, which are a standardized format and can be exported from popular machine learning libraries like PyTorch and TensorFlow.
Multi-Backend Support: It takes advantage of different computational backends such as CUDA, TensorRT, Core ML, and ARM Compute Library, ensuring versatile performance on various platforms.
Nx.Serving Integration: Ortex leverages the Nx.Serving framework for model deployment, enabling users to easily integrate models within their applications' supervision trees. This feature is particularly useful for running models in production environments.
Dynamic Input Handling: When inspecting models, Ortex accommodates dynamic input sizes, as exemplified by axes marked with nil, representing variable dimensions.

Practical Examples

To illustrate the capabilities of Ortex, consider these examples:

Model Loading and Running:

Ortex allows users to load a model file, such as resnet50.onnx, and inspect its input and output specifications for an efficient setup:

iex> model = Ortex.load("./models/resnet50.onnx")
#Ortex.Model<
  inputs: [{"input", "Float32", [nil, 3, 224, 224]}]
  outputs: [{"output", "Float32", [nil, 1000]}]>

Inference Execution:

Execute the model to get results, such as identifying the maximum value index from the model's output:

iex> {output} = Ortex.run(model, Nx.broadcast(0.0, {1, 3, 224, 224}))
iex> output |> Nx.backend_transfer() |> Nx.argmax
#Nx.Tensor<
  s64
  499
>

Serving a Model:

Ortex streamlines the process of deploying a model within a serving environment:

iex> serving = Nx.Serving.new(Ortex.Serving, model)
iex> batch = Nx.Batch.stack([{Nx.broadcast(0.0, {3, 224, 224})}])
iex> {result} = Nx.Serving.run(serving, batch)
iex> result |> Nx.backend_transfer() |> Nx.argmax(axis: 1)
#Nx.Tensor<
  s64[1]
  [499]
>

Installation

To start working with Ortex, users must first include it in their project dependencies. The following lines can be added to the mix.exs file:

def deps do
  [
    {:ortex, "~> 0.1.9"}
  ]
end

Moreover, having Rust installed is necessary for successful compilation, aligning with Ortex's required environments.

By abstracting the complexities of deploying and managing ONNX models, Ortex empowers developers and machine learning practitioners to leverage powerful machine learning models with ease, enhancing their workflows and bringing efficient model inference capabilities to their applications.