Project Icon

llm-compressor

Optimize Machine Learning Models with Comprehensive Quantization for Efficient Deployment

Product DescriptionThis library facilitates integration with Hugging Face models and optimizes deployment using quantization algorithms. Notable features include support for safetensors-based formats and compatibility with large models through accelerate. It offers a variety of quantization options like W8A8, Mixed Precision, and SparseGPT. Algorithms such as SmoothQuant and GPTQ are readily applicable for activating and weighting. Discover comprehensive examples and user guides for rapid model deployment and execution using llmcompressor with vllm, promoting swift inference and model efficiency.
Project Details