#Model Compression
AliceMind
Discover Alibaba MinD Lab's comprehensive suite of advanced pre-trained models and techniques, including the mPLUG-Owl2 for multimodal enhancement. Explore resources across vision-language understanding and cross-lingual tasks with innovative releases like mPLUG-DocOwl and Youku-mPLUG, designed for high-performance AI applications.
Efficient-LLMs-Survey
The survey systematically reviews efficiency challenges and solutions for LLMs, offering a clear taxonomy in model-centric, data-centric, and system domains. Recognizing the computational demands of LLMs, it underscores the importance of techniques like model compression, quantization, parameter pruning, and efficient tuning. This resourceful overview aims to aid researchers and practitioners in advancing LLM efficiency without overstating or using subjective descriptions.
Awesome-Deep-Neural-Network-Compression
Discover an extensive array of papers, summaries, and codes concerning deep neural network compression methods such as quantization, pruning, and distillation. This resource explores network architecture search, adversarial robustness, NLP compression, and efficient model design, providing access to tools like DeepSpeed, ColossalAI, and PocketFlow, along with comprehensive summaries that connect theory with practical applications in model optimization.
Efficient-Computing
Explore methods developed by Huawei Noah's Ark Lab for efficient computing, emphasizing data-efficient model compression and binary networks. The repository includes advancements in pruning (e.g., GAN-pruning), model quantization (e.g., DynamicQuant), and self-supervised learning (e.g., FastMIM). Discover training acceleration techniques and efficient object detection methods like Gold-YOLO. Also, find efficient solutions for low-level vision tasks with models such as IPG. These resources are designed to optimize neural network performance, focusing on minimal training data use.
PaddleSlim
PaddleSlim offers a comprehensive library for compressing deep learning models, utilizing techniques like low-bit quantization, knowledge distillation, pruning, and neural architecture search. These methods help to optimize model size and performance on different hardware such as Nvidia GPUs and ARM chips. Key features include automated compression support for ONNX models and analytical tools for refining strategies. PaddleSlim also provides detailed tutorials and documentation for applying these methods in natural language processing and computer vision fields.
amc
This repository offers a PyTorch implementation of techniques detailed in the paper 'AMC: AutoML for Model Compression and Acceleration on Mobile Devices'. It includes a methodological workflow for compressing MobileNet models on ImageNet, covering strategy search, weight export, and fine-tuning. The code enables replication of the compression process, facilitating significant FLOPs reduction without compromising accuracy. Pre-compressed MobileNet models are accessible in PyTorch and TensorFlow formats, alongside detailed performance statistics.
Feedback Email: [email protected]