server
Triton Inference Server is an open-source inference software that facilitates deploying AI models across various platforms. It supports multiple deep learning and machine learning frameworks, including TensorFlow and PyTorch, ensuring seamless integration and optimized performance on NVIDIA GPUs, ARM CPUs, or AWS Inferentia. Designed for real-time, batched, and streaming queries, the server supports features such as dynamic batching, custom backends, sequence batching, and ensemble models. It also provides comprehensive metrics for performance insights and offers both HTTP/REST and GRPC protocols. As part of NVIDIA AI Enterprise, Triton contributes to enhancing data science workflows and AI deployment.