#TensorRT
TensorRT
Discover NVIDIA's TensorRT open-source components, including plugins and ONNX parser support. The repository provides sample apps showcasing platform capabilities, enhancements, and fixes. Developers will find coding guidelines and contribution instructions helpful. The Python package facilitates installation, compatible with CUDA, cuDNN, and vital tools for smooth deployment. Engage with the TensorRT community for updates and enterprise support through NVIDIA AI Enterprise. Access detailed developer guides and forums for further assistance.
TensorRT-YOLO
The TensorRT-YOLO project supports enhanced inference for YOLOv3 to YOLO11 and PP-YOLOE models through NVIDIA TensorRT optimization. It integrates TensorRT plugins, CUDA kernels, and CUDA Graphs to deliver a fast object detection solution compatible with C++ and Python. Key features include ONNX export, command-line model export, and Docker deployment.
efficientvit
EfficientViT leads in efficient vision models, enhancing high-resolution generation and perception using multi-scale linear attention. It improves image classification, segmentation, and text-to-image generation. The Deep Compression Autoencoder (DC-AE) offers high spatial compression, boosting diffusion models' speed without quality loss. EfficientViT-SAM provides significant speed advantages without sacrificing accuracy, suitable for real-time use, exemplifying advanced tensor processing techniques for superior performance-efficiency.
deepdetect
DeepDetect is an open-source machine learning server and API designed to facilitate integration into various applications. Built using C++11, it supports both training and inference processes with automatic conversion capabilities for platforms such as TensorRT and NCNN. It enables both supervised and unsupervised learning tasks, including classification, object detection, and segmentation. Integrating seamlessly with renowned libraries like Caffe, Tensorflow, and XGBoost, DeepDetect is optimized for high-performance and scalable machine learning solutions.
REAL-Video-Enhancer
REAL Video Enhancer provides a practical video frame interpolation and upscaling solution across Windows, Linux, and macOS. Unlike older software, it features tools such as TensorRT and NCNN for optimized GPU performance, allowing seamless integration across platforms with minimal hardware requirements. Its user-friendly interface facilitates easy video processing, as evidenced by benchmark tests showing high frame rates and reliable performance for those seeking advanced but approachable video enhancement technologies.
edgeyolo
EdgeYOLO advances object detection capabilities on edge devices such as Nvidia Jetson AGX Xavier by achieving 34FPS and 50.6% AP on the COCO2017 dataset. The project improves detection of smaller objects with innovative loss functions and data augmentation. Updates include conversion support for ONNX to OM for Huawei Ascend, Docker-based environments for model training, and deployment across various edge platforms. The project also incorporates TensorRT integration and cross-platform demo capabilities, with future enhancements in segmentation tasks and model variations. Refer to the arXiv publication for comprehensive insights.
WhisperLive
WhisperLive utilizes OpenAI's Whisper model for real-time speech-to-text conversion from various audio sources including microphone input, pre-recorded files, RTSP, and HLS streams. With support for Faster Whisper and TensorRT backends, it provides flexible performance across different environments. The project supports multilingual transcription and can be deployed in both GPU and CPU setups. Additionally, browser extensions enhance its usability by enabling direct audio transcription. WhisperLive offers an efficient setup and environment configuration for diverse transcription needs.
DeepStream-Yolo
The guide provides a comprehensive overview of configuring YOLO models with the NVIDIA DeepStream SDK, highlighting INT8 calibration, support for non-square models, and GPU post-processing. It offers detailed insights into dynamic batch-size capabilities, custom ONNX model parsing, and a wide range of model support including Darknet, YOLOv5 through YOLOv8, and PPYOLOE+. Suitable for DeepStream versions 5.1 to 7.0 on x86 and Jetson platforms, this resource is ideal for developers aiming to enhance YOLO functionalities in their projects.
onnx-tensorrt
Optimize deep learning workflows with the TensorRT backend designed for ONNX model execution. This project aligns with TensorRT 10.5, ensuring full-dimension and dynamic shape processing. It integrates seamlessly with C++ and Python tools such as trtexec and polygraphy, enhancing model parsing efficiency. Comprehensive documentation, including FAQs and changelogs, aids in adaptive CUDA environment setups, making it a robust choice for ONNX deployment across experience levels.
TensorRT
Torch-TensorRT integrates TensorRT capabilities into PyTorch to boost inference speed by up to 5 times. Achieve optimal NVIDIA platform performance with simple code integration. Available for easy installation through PyPI and nightly versions from the PyTorch index, it supports the torch.compile method for swift optimization and a robust export workflow for C++ environments. Compatible with GPU on Linux and Windows and offers native compilation for aarch64 using JetPack. Resources include tools for Stable Diffusion acceleration and FP8 model execution with improved graph performance.
FasterLivePortrait
FasterLivePortrait delivers real-time portrait animation on RTX 3090 using TensorRT, achieving over 30 FPS. It supports cross-platform deployment with ONNX models, providing approximately 12 FPS. The project enhances functionality by supporting native Gradio apps, multi-face handling, and animal models. Recent updates focus on speed optimization and bug fixes. Deployment is simplified with Docker, a Windows package, and macOS support for M1/M2 chips. Ideal for diverse AI applications.
Live2Diff
Live2Diff enhances live video translation using uni-directional temporal attention and a warmup mechanism. Notable features include a multi-timestep KV-Cache, depth prior for structure consistency, and DreamBooth and LoRA style support. Optimized with TensorRT, it delivers high FPS with RTX 4090 and Intel Xeon. Installation and setup are straightforward, enabling robust video2video applications using advanced diffusion models.
SwiftInfer
SwiftInfer integrates TensorRT to enhance Streaming-LLM, enabling LLM inference with extended input length and mitigating model collapse through Attention Sink technology. Developed from TensorRT-LLM, it offers a flexible framework for the deployment of efficient, multi-turn conversational AI systems. The platform features detailed installation guidance, compatibility checks, and benchmarking data against the original PyTorch version. SwiftInfer persistently advances to lead in LLM technology, underlining effective integration and computational efficiency. Discover a solid solution for sophisticated AI inference.
Radiata
Radiata offers an optimized Stable Diffusion WebUI, using TensorRT for improved performance. It includes features like Stable Diffusion XL and ControlNet plugin compatibility, and supports Lora & Lycoris for varied uses. Installation is straightforward for Windows and Linux. Visit the official documentation for more on its features and setup.
bevfusion
BEVFusion integrates camera and LiDAR data in a bird's-eye view format, enhancing autonomous driving sensors by preserving vital semantic information and optimizing performance. It achieves superior detection results and lower latency, with proven effectiveness in top-tier benchmarks.
tiny-tensorrt
Discover a user-friendly NVIDIA TensorRT wrapper for deploying ONNX models in C++ and Python. Despite its lack of ongoing maintenance, tiny-tensorrt emphasizes efficient deployment using minimal coding. Dependencies include CUDA, CUDNN, and TensorRT, easily setup through NVIDIA's Docker. With support for multiple CUDA and TensorRT versions, it integrates smoothly into projects. Documentation and installation guidance are available on its GitHub wiki.
yolort
This project combines training and inference for object detection using a dynamic shape strategy, based on the YOLOv5 model framework. It incorporates pre-processing and post-processing directly into the model graph, thereby facilitating deployment on platforms such as LibTorch, ONNX Runtime, TVM, and TensorRT. The design takes cues from Ultralytics's YOLOv5, ensuring familiarity for those used to torchvision's models. Recent enhancements include TensorRT C++ interface integration and expanded ONNX Runtime support. The project offers simple installation via PyPI or source with minimal dependencies, enhancing the efficiency of both Python and C++ deployment.
YOLOv8-TensorRT-CPP
This C++ implementation of YOLOv8 via TensorRT excels in object detection, semantic segmentation, and body pose estimation. Optimized for GPU inference, the project utilizes the TensorRT C++ API and facilitates integration with ONNX models converted from PyTorch. The project runs on Ubuntu, necessitating CUDA, cudnn, and CUDA-supported OpenCV. Users will find comprehensive setup instructions, model conversion guidance, and INT8 inference optimization tips. This project is ideal for developing high-performance vision applications on NVIDIA GPUs.
jetson-inference
Delve into deep learning for NVIDIA Jetson devices using TensorRT for optimized GPU use in C++ and Python, with PyTorch enabling model training. This project features image classification, object detection, semantic segmentation, and more, through DNN tools such as imageNet and detectNet. Includes integration with WebRTC and ROS/ROS2 for live camera inferencing. Follow Hello AI World tutorials to build, train, and deploy custom models. For more insights, explore LLMs and Vision Transformers tutorials at Jetson AI Lab.
iAI
This guide offers a comprehensive walkthrough for establishing AI experimental environments and implementing deep learning algorithms, leveraging various hardware and software tools on the Ubuntu platform. It includes detailed instructions on installing NVIDIA drivers, CUDA, cuDNN, and Anaconda, alongside utilizing AI frameworks like TensorFlow and PyTorch. This resource also covers YOLO V3, Faster R-CNN, and advanced AI model optimization techniques, making it suitable for users dealing with dual-boot systems, AI model optimization, or troubleshooting setup issues.
yolov5-face
This project uses the YOLOv5-Face architecture for real-time, high-accuracy face detection. The integration of models like BlazeFace and SCRFD enhances performance across different levels of difficulty. With TensorRT optimization, it significantly reduces inference time, making it ideal for multi-scale face detection. Compatibility with Android and OpenCV DNN extends its application range, supporting participation in global events like the ICCV2021 Masked Face Recognition Challenge.
TensorRT_Tutorial
Discover how to optimize model efficiency and speed with NVIDIA TensorRT's high-performance inference capabilities. This guide provides an objective overview, focusing on INT8 optimization and including insights into user guide translations, sample code analysis, and practical usage experiences. Access educational resources like translated content, videos, and relevant blogs. Ideal for developers interested in maximizing TensorRT's utility without embellishment, this tutorial addresses documentation challenges and showcases best practices in deploying deep learning models with TensorRT.
YOLOv8-TensorRT
YOLOv8-TensorRT boosts YOLOv8 performance by employing TensorRT for faster inference. It leverages CUDA and C++ for engine construction and facilitates ONNX model export with NMS integration. This project provides flexible deployment options using Python and Trtexec on various platforms, including Jetson. The comprehensive setup guide helps adapt to different AI deployment needs, offering an efficient PyTorch alternative.
Feedback Email: [email protected]