Project Icon

fastllm

Multi-Platform C++ Inference Library for Accelerated Large Model Processing

Product DescriptionFastllm is a pure C++ library requiring no third-party dependencies, ensuring high-performance inference across platforms like ARM, X86, and NVIDIA. It supports Hugging Face model quantization and OpenAI API server setups, aiding multi-GPU and CPU deployments with dynamic batching. Featuring a front-end and back-end separation for better device compatibility, it integrates with models such as ChatGLM and LLAMA. Python support also allows custom model structures with extensive documentation for straightforward setup and use.
Project Details