en

#Model Compression

Discover Alibaba MinD Lab's comprehensive suite of advanced pre-trained models and techniques, including the mPLUG-Owl2 for multimodal enhancement. Explore resources across vision-language understanding and cross-lingual tasks with innovative releases like mPLUG-DocOwl and Youku-mPLUG, designed for high-performance AI applications.

Efficient-LLMs-Survey

The survey systematically reviews efficiency challenges and solutions for LLMs, offering a clear taxonomy in model-centric, data-centric, and system domains. Recognizing the computational demands of LLMs, it underscores the importance of techniques like model compression, quantization, parameter pruning, and efficient tuning. This resourceful overview aims to aid researchers and practitioners in advancing LLM efficiency without overstating or using subjective descriptions.

Awesome-Deep-Neural-Network-Compression

Discover an extensive array of papers, summaries, and codes concerning deep neural network compression methods such as quantization, pruning, and distillation. This resource explores network architecture search, adversarial robustness, NLP compression, and efficient model design, providing access to tools like DeepSpeed, ColossalAI, and PocketFlow, along with comprehensive summaries that connect theory with practical applications in model optimization.

Efficient-Computing

Explore methods developed by Huawei Noah's Ark Lab for efficient computing, emphasizing data-efficient model compression and binary networks. The repository includes advancements in pruning (e.g., GAN-pruning), model quantization (e.g., DynamicQuant), and self-supervised learning (e.g., FastMIM). Discover training acceleration techniques and efficient object detection methods like Gold-YOLO. Also, find efficient solutions for low-level vision tasks with models such as IPG. These resources are designed to optimize neural network performance, focusing on minimal training data use.

PaddleSlim offers a comprehensive library for compressing deep learning models, utilizing techniques like low-bit quantization, knowledge distillation, pruning, and neural architecture search. These methods help to optimize model size and performance on different hardware such as Nvidia GPUs and ARM chips. Key features include automated compression support for ONNX models and analytical tools for refining strategies. PaddleSlim also provides detailed tutorials and documentation for applying these methods in natural language processing and computer vision fields.

This repository offers a PyTorch implementation of techniques detailed in the paper 'AMC: AutoML for Model Compression and Acceleration on Mobile Devices'. It includes a methodological workflow for compressing MobileNet models on ImageNet, covering strategy search, weight export, and fine-tuning. The code enables replication of the compression process, facilitating significant FLOPs reduction without compromising accuracy. Pre-compressed MobileNet models are accessible in PyTorch and TensorFlow formats, alongside detailed performance statistics.

Knowledge-Distillation-Toolkit

The Knowledge Distillation Toolkit is a solution for compressing machine learning models with knowledge distillation, tailored for use with PyTorch and PyTorch Lightning. The toolkit supports the implementation of teacher and student models, data loaders for both training and validation processes, and an inference pipeline for performance evaluation. Designed to minimize model size while maintaining accuracy, it enables efficient knowledge transfer from a larger, complex model to a smaller student model. The toolkit also offers flexible configuration options such as customizable architectures, optimization methods, and learning rate scheduling to refine the model compression workflow.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]