SqueezeLLM
Discover how the Dense-and-Sparse Quantization framework streamlines large language model deployment by reducing memory needs while maintaining performance. By splitting weight matrices into dense and sparse portions, this technique improves accuracy and processing speed, demonstrated by Vicuna models achieving better MMLU scores with lower memory use. Learn about new support updates for models like Mistral and Vicuna in the vLLM framework.