#ONNX Runtime
onnxruntime
ONNX Runtime optimizes machine learning by accelerating inference and training across platforms. It supports models from frameworks like PyTorch and TensorFlow, and systems like scikit-learn and XGBoost, focusing on hardware optimization. By using multi-node NVIDIA GPUs, it notably reduces training time with minimal changes to PyTorch scripts. With compatibility across various operating systems, ONNX Runtime efficiently enhances performance while cutting costs. Access resources for deeper insights.
Olive
Olive offers a user-friendly tool for optimizing models with awareness of specific hardware needs. It integrates leading techniques in compression, optimization, and compilation to simplify the engineering process, accommodating multiple vendor-specific toolchains for cloud and edge deployments. With a flexible framework, Olive allows for the easy inclusion of new industry innovations, catering to diverse hardware requirements while adhering to constraints such as accuracy and latency. Recent developments highlight enhancements for AI applications and DirectML performance.
DirectML
DirectML is a hardware-accelerated DirectX 12 library optimized for machine learning tasks on GPUs from AMD, Intel, NVIDIA, and Qualcomm. It integrates with Direct3D 12, minimizing latency and maximizing performance across platforms. Available on Windows 10 and Windows Subsystem for Linux, and as a standalone package, DirectML supports frameworks such as Windows ML and ONNX Runtime, facilitating model training and inference for PyTorch and TensorFlow applications.
fastembed
FastEmbed is a Python library for generating text and image embeddings. It supports various popular models and uses ONNX Runtime instead of PyTorch, which is optimized for serverless environments and provides significant speed and accuracy improvements over competitors like OpenAI's Ada-002. The library can be installed via pip, with GPU support if needed, and is suitable for large datasets using data parallelism. FastEmbed supports multiple embeddings types including dense, sparse, and late interaction models, and integrates with Qdrant.
neural-compressor
Intel Neural Compressor offers model compression techniques including quantization, pruning, and distillation for frameworks like TensorFlow and PyTorch. It supports various Intel hardware and other platforms via ONNX Runtime. The library facilitates validating LLMs, cloud integrations, and optimizing model performance. Recent updates improve performance and integrate user-friendly APIs to enhance efficiency.
ort
The Rust wrapper for ONNX Runtime v1.19 enhances machine learning inference and training efficiency on both CPU and GPU. Building upon the inactive onnxruntime-rs project, it offers smooth migration paths and is utilized by projects like Twitter and Supabase for recommendation improvements and reduced serverless function cold starts. Comprehensive guides and community support via Discord and GitHub are available for seamless project integration.
optimum
Optimum provides optimization tools to improve model training and inference efficiency across multiple hardware platforms. Supporting frameworks like ONNX Runtime, OpenVINO, and TensorFlow Lite, it ensures easy integration and performance improvement. Techniques such as graph optimization, post-training quantization, and QAT can be applied for better model execution. Optimum eases installation and deployment with configurations for Intel, Nvidia, AWS, and more, facilitating model exportation, quantization, and execution optimization with advanced hardware.
transformers.js
Discover in-browser advanced machine learning capabilities with this JavaScript library, equivalent in functionality to Hugging Face's Python offering. Use pretrained models for various tasks in NLP, computer vision, and audio without server dependencies. Benefit from ONNX Runtime support for executing models and straightforward conversion from frameworks like PyTorch, TensorFlow, or JAX, suitable for browser-based projects in natural language, image, and audio processing.
yolort
This project combines training and inference for object detection using a dynamic shape strategy, based on the YOLOv5 model framework. It incorporates pre-processing and post-processing directly into the model graph, thereby facilitating deployment on platforms such as LibTorch, ONNX Runtime, TVM, and TensorRT. The design takes cues from Ultralytics's YOLOv5, ensuring familiarity for those used to torchvision's models. Recent enhancements include TensorRT C++ interface integration and expanded ONNX Runtime support. The project offers simple installation via PyPI or source with minimal dependencies, enhancing the efficiency of both Python and C++ deployment.
head-pose-estimation
This project provides a system for real-time human head pose estimation with ONNX Runtime and OpenCV. Key steps include: face detection with a bounding box, facial landmark identification using a deep learning model, and pose computation via a PnP algorithm. It runs on Ubuntu 22.04 and requires ONNX Runtime 1.17.1 and OpenCV 4.5.4. The code is compatible with video files and webcams, and includes detailed setup and usage instructions. Licensed under the MIT license, it utilizes publicly available datasets, enhancing both accessibility and practical application.
ortex
Ortex, a wrapper for ONNX Runtime, enhances the deployment of ONNX models by supporting concurrent and distributed execution with Nx.Serving. This tool caters to various backends, including CUDA and Core ML, for efficient inference and easy model handling. Designed for models exported from PyTorch and TensorFlow, it offers a storage-only tensor implementation suitable for integration within Elixir applications. Installation involves adding Ortex to dependencies in mix.exs, with Rust required for compilation.
Windows-Machine-Learning
Discover a machine learning inference API optimized for real-time use with ONNX Runtime and DirectML. Windows Machine Learning offers resources and samples to incorporate ML features into Windows applications, catering to frameworks and real-time gaming needs. Leverage Windows SDK or a NuGet package for inferencing, enhance models, and verify with tools such as WinMLRunner and WinML Dashboard. Explore extensive model samples, tutorials, and advanced use-cases for ML integration in UWP and desktop apps.
Feedback Email: [email protected]