#Optimization

Logo of TensorRT-YOLO
TensorRT-YOLO
The TensorRT-YOLO project supports enhanced inference for YOLOv3 to YOLO11 and PP-YOLOE models through NVIDIA TensorRT optimization. It integrates TensorRT plugins, CUDA kernels, and CUDA Graphs to deliver a fast object detection solution compatible with C++ and Python. Key features include ONNX export, command-line model export, and Docker deployment.
Logo of stable-diffusion-webui-ux
stable-diffusion-webui-ux
The interface enhances interactions with Stable Diffusion, providing customization and speed using Gradio. Features include mobile responsiveness, a micro-template engine, and console logs for debugging. Compatible with Gradio 3 and 4, it offers advanced usability with toggle input and slider options, as well as seamless extension integration like Deforum and Aspect-Ratio-Helper. Future updates will introduce a theme editor and workspace management for tailored workflows, offering optimized styles and reducing redundancies.
Logo of awesome-tensor-compilers
awesome-tensor-compilers
Browse a comprehensive list of projects and papers focused on advancements in tensor computation and deep learning compilers. Discover open source tools such as TVM, MLIR, and Triton, as well as research on optimization techniques for CPU, GPU, and NPU, and graph-level improvements. Engage with tutorials and contribute to this dynamic field, perfectly suited for researchers, developers, and enthusiasts keen on leveraging innovative solutions in machine learning compiler optimization.
Logo of FourierKAN
FourierKAN
FourierKAN is a Pytorch layer that serves as an alternative to traditional Linear + non-linear activations, utilizing 1D Fourier coefficients inspired by Kolmogorov-Arnold Networks. It optimizes computational efficiency and offers periodic function benefits. The layer is usable on both CPU and GPU, with a naive implementation that manages memory proportional to gridsize and plans for advanced fused operations. Training is enhanced with Brownian noise initialization and frequency regularization for function smoothness. Current offerings are MIT licensed, while future versions may include proprietary fused kernels.
Logo of 1brc
1brc
This project presents a Golang-based approach to efficiently aggregating one billion data rows. It documents iterative optimizations including concurrent processing with goroutines, buffered communication, and producer-consumer patterns. Transitioning from an initial approach to a highly refined solution, it reveals techniques for reducing memory usage and boosting performance. Gain insights into the step-by-step enhancements and the creative problem-solving applied in data processing tasks.
Logo of mola
mola
Explore modular optimization techniques for efficient localization and mapping integrated with ROS 2 Humble, Iron, Jazzy, and Rolling. MOLA offers features like metrics evaluation and imu preintegration supporting amd64 and arm64 architectures. Access documentation for build instructions, demos, and API references, helping researchers and developers in robotics leverage a comprehensive optimization framework.