#model optimization

Logo of Paddle-Lite
Paddle-Lite
Paddle Lite offers a high-performance and flexible inference solution, supporting a wide range of hardware environments such as mobile and edge devices. It integrates smoothly with PaddlePaddle models and serves both Baidu's internal needs and external applications in various industries. The framework comes equipped with tools for model optimization and extensive compatibility with platforms like Android, iOS, and Linux. APIs available in Java, Python, and C++ provide users with the flexibility to implement rapid machine learning solutions. Explore comprehensive documentation and examples for seamless integration and enhanced inference performance.
Logo of sparseml
sparseml
SparseML is an open-source toolkit that optimizes neural networks using sparsification techniques, including pruning, quantization, and distillation. These methods create faster, smaller models while maintaining performance. SparseML integrates with PyTorch and Hugging Face and supports Sparse Transfer Learning through SparseZoo pre-trained models. Additionally, it converts optimized models to ONNX for deployment with DeepSparse, achieving GPU-level performance on CPUs. The toolkit provides a flexible recipe-based approach to model optimization with comprehensive tutorials and popular ML framework integrations.
Logo of onnx-simplifier
onnx-simplifier
ONNX Simplifier improves ONNX model performance by simplifying computational graphs through constant folding, replacing redundant operators with constant outputs. It is accessible as both an online tool and a Python package, requiring no installation for browser use and easy integration with the `onnxsim` command. Many projects like MXNet, MMDetection, and YOLOv5 use ONNX Simplifier for better model efficiency.
Logo of onnx-mlir
onnx-mlir
This project employs LLVM/MLIR technology to convert ONNX graphs into executable code with minimal runtime. It includes an ONNX dialect and interfaces for transforming ONNX graphs into various forms, alongside a multi-language runtime environment. It supports code generation for generic CPUs and IBM AI accelerators, with Docker recommended for setup. Community interactions take place via Slack and weekly meetings, and comprehensive documentation is provided for developers interested in efficient neural network model deployment.