llm-compressor
This library facilitates integration with Hugging Face models and optimizes deployment using quantization algorithms. Notable features include support for safetensors-based formats and compatibility with large models through accelerate. It offers a variety of quantization options like W8A8, Mixed Precision, and SparseGPT. Algorithms such as SmoothQuant and GPTQ are readily applicable for activating and weighting. Discover comprehensive examples and user guides for rapid model deployment and execution using llmcompressor with vllm, promoting swift inference and model efficiency.