#NVIDIA
FasterTransformer
FasterTransformer offers highly optimized transformer-based encoders and decoders for GPU-driven inference. Utilizing CUDA and C++, it integrates seamlessly with TensorFlow, PyTorch, and Triton, providing practical examples. Key features include FP16 precision and INT8 quantization for substantial speedup in BERT, decoder, and GPT tasks, enhancing processing efficiency across NVIDIA GPU architectures.
iAI
This guide offers a comprehensive walkthrough for establishing AI experimental environments and implementing deep learning algorithms, leveraging various hardware and software tools on the Ubuntu platform. It includes detailed instructions on installing NVIDIA drivers, CUDA, cuDNN, and Anaconda, alongside utilizing AI frameworks like TensorFlow and PyTorch. This resource also covers YOLO V3, Faster R-CNN, and advanced AI model optimization techniques, making it suitable for users dealing with dual-boot systems, AI model optimization, or troubleshooting setup issues.
HierarchicalKV
HierarchicalKV, an integral component of NVIDIA Merlin, provides a hierarchical key-value storage system designed for recommender system applications. It optimizes the management of feature embeddings on GPU and host memory, addressing issues such as single GPU memory limitations and complex inter-CPU communication. By avoiding CPU use and employing advanced eviction strategies, HierarchicalKV improves the performance of substantial recommendation models, offering high load efficiency and customizable management strategies for constructing, evaluating, and deploying recommendation models.
TensorRT_Tutorial
Discover how to optimize model efficiency and speed with NVIDIA TensorRT's high-performance inference capabilities. This guide provides an objective overview, focusing on INT8 optimization and including insights into user guide translations, sample code analysis, and practical usage experiences. Access educational resources like translated content, videos, and relevant blogs. Ideal for developers interested in maximizing TensorRT's utility without embellishment, this tutorial addresses documentation challenges and showcases best practices in deploying deep learning models with TensorRT.
gpustat
gpustat provides a command-line interface for monitoring NVIDIA GPU statistics with features such as color-coded outputs, user and process information, power usage, and real-time updates. It is compatible with nvidia-ml-py and requires NVIDIA Driver 450.00 or higher. Currently not supporting AMD devices, it is installable through PyPI. It supports Python 3.4 and above, with options for a web interface and advanced CUDA environment queries.
TensorRT
Discover NVIDIA's TensorRT open-source components, including plugins and ONNX parser support. The repository provides sample apps showcasing platform capabilities, enhancements, and fixes. Developers will find coding guidelines and contribution instructions helpful. The Python package facilitates installation, compatible with CUDA, cuDNN, and vital tools for smooth deployment. Engage with the TensorRT community for updates and enterprise support through NVIDIA AI Enterprise. Access detailed developer guides and forums for further assistance.
AI-News-Daily
Stay informed on the rapidly changing AI landscape with our daily updates on new technologies and innovations. Discover tools and developments such as typing assistants powered by AI, voice-enabled robots, and advanced Chinese AI hardware. Our coverage features expert insights, notable trends, and noteworthy launches like the humanoid GPT and image-enhancing technology from companies like Huawei and Tencent. Ideal for AI followers and professionals seeking to explore new AI applications and research without exaggerated expressions.
nvdiffrec
The nvdiffrec project focuses on the joint optimization of 3D topology, materials, and lighting using multi-view images. Leveraging NVIDIA's Kaolin for enhanced 3D deep learning, this project has introduced a slangpy-based renderutils library for cleaner code and now supports FlexiCubes isosurfacing. Compatible with Python 3.6+, VS2019+, CUDA 11.3+, and PyTorch 1.10+, it is optimized for high-end NVIDIA GPUs and offers configurable 3D model extraction procedures. Explore installation, dataset setup, and example executions in the guide.
rsl_rl
This project provides a quick and efficient implementation of reinforcement learning algorithms optimized for GPU. Initially based on the NVIDIA Isaac GYM `rl-pytorch`, it currently supports PPO with plans to include more algorithms like SAC and DDPG. Managed by researchers from ETH Zurich and NVIDIA's Robotic Systems Lab, the framework facilitates logging via Tensorboard, Weights & Biases, and Neptune. It is intended for researchers expanding reinforcement learning capabilities and promotes community contributions while following the Google Style Guide for documentation. To set up, clone the repository and adhere to the instructions for seamless integration into various environments.
apex
This repository provides NVIDIA tools that facilitate advanced mixed precision and distributed training in PyTorch. While some modules are deprecated, equivalent PyTorch solutions are available. Apex offers streamlined training with examples for ImageNet; supports synchronized batch normalization and leverages NVIDIA's NCCL library. Available for Linux and experimental Windows, it supports custom C++/CUDA extensions to enhance performance.
jetson-containers
This project provides a modular container build system tailored for NVIDIA Jetson and JetPack. It supports various AI/ML packages, including PyTorch, TensorFlow, and DeepStream, as well as applications in speech recognition, robotics, and graphics. Users can modify CUDA versions and integrate diverse software packages to optimize AI tasks. Comprehensive documentation and tutorials assist in setup and management for an effective user experience.
AI-Chip
Discover the newest updates in AI chip technology involving key players such as NVIDIA, Qualcomm, and Intel, along with various AI startups. Gain insights into NVIDIA's recent projects, Intel's new AI processor generation, and Qualcomm's mobile and cloud AI solutions, while staying updated on industrial trends from IC vendors and leading-edge companies in the AI field. Evaluate AI benchmark results to understand the current technological progress.
pflowtts_pytorch
P-Flow utilizes a speech-prompted text encoder and flow matching generative decoder for efficient zero-shot TTS, achieving notable speaker adaptation and synthesis speed improvements compared to large-scale models. Trained on the LibriTTS dataset, P-Flow maintains high speaker similarity and pronunciation quality.
VideoProcessingFramework
As the Video Processing Framework (VPF) transitions to PyNvVideoCodec, it continues to offer robust video processing capabilities with full HW acceleration, supporting decoding, encoding, transcoding, and GPU-accelerated conversions. The framework simplifies the transfer of decoded video frames to PyTorch tensors with minimal overhead. Ideal for developers working on Linux or Windows environments, it requires an NVIDIA display driver 525.xx.xx or above, CUDA Toolkit 11.2 or higher, and FFMPEG. Seamlessly install the framework with a single command, explore Docker-based installations, and benefit from active community support for a comprehensive video technology experience. Advanced users can compile components and integrate dependencies for enhanced performance in GPU environments. Discover the unique advantages over alternative solutions for better clarity and efficiency.
ai-assisted-annotation-client
NVIDIA AI-Assisted Annotation SDK enables AI integration in medical imaging with a versatile client-server architecture, supporting both C++ and Python. It offers functionalities such as segmentation and 3D annotation across Linux, macOS, and Windows. With plugins like MITK and 3D Slicer, the SDK further enhances its utility. Downloadable from the NVIDIA website, the Clara Train SDK comes with resources to quickly launch AI enhancements. Cross-platform compatibility broadens its applicability, while community contributions drive ongoing development.
NeMo-Curator
NeMo Curator is a GPU-optimized open-source library designed to speed up dataset preparation in generative AI contexts. Utilizing Dask and RAPIDS, it provides efficient modules for curating multilingual text and images, thereby enhancing training and tuning processes. Features such as language identification, filtering, and deduplication support various AI tasks, including pretraining and fine-tuning. Its modular approach allows for the customization of data workflows while maintaining objectivity and clarity.
edm2
The official PyTorch code for the CVPR 2024 paper presents improvements in training dynamics of diffusion models for image synthesis. By addressing inefficiencies in the ADM diffusion model, the paper suggests network redesigns to maintain activation and weight balance without changing the overall structure. These optimizations improve FID scores from 2.41 to 1.81 on ImageNet-512, using deterministic sampling. A new method for post-training EMA parameter tuning is also introduced, enabling precise adjustments without extra training runs.
TransformerEngine
Transformer Engine uses FP8 precision to accelerate Transformer models on NVIDIA Hopper GPUs, facilitating enhanced memory efficiency during training and inference. It includes optimized modules and a mixed-precision API for integration with deep learning frameworks, supporting architectures like BERT, GPT, and T5. With accessible Python and C++ APIs, Transformer Engine enables mixed-precision training, offering speed improvements with minimal accuracy changes. Compatible with major LLM libraries and supporting various GPU architectures, it is a versatile tool for NLP projects.
Megatron-LM
Discover NVIDIA's open-source library designed for efficient training of large language models with GPU optimization. Megatron-Core provides modular APIs for enhanced system-level optimization and scalability, supporting multimodal training on NVIDIA infrastructure. Features include advanced parallelism strategies and comprehensive components for transformers such as BERT and GPT, ideal for AI researchers and developers. It integrates smoothly with frameworks like NVIDIA NeMo and PyTorch.
Feedback Email: [email protected]