Project Icon

airllm

Efficient Inference for Large Language Models on Low-End Hardware

Product DescriptionAirLLM facilitates large language model operation by minimizing hardware demands. It allows 70B models to run on 4GB GPUs and up to 405B models on 8GB GPUs through advanced model compression, without requiring quantization, distillation, or pruning. Recent updates include support for Llama3, CPU inference, and compatibility with ChatGLM and Qwen models.
Project Details