Project Icon

dash-infer

Enables Efficient LLM Inference with Optimized C++ Runtime on x86 and ARM Platforms

Product DescriptionDashInfer is an optimized C++ runtime that ensures scalable and efficient inference for large language models (LLMs) across multiple hardware platforms, including x86 and ARMv9. It features Continuous Batching and NUMA-Aware support for enhanced CPU performance and minimal third-party dependencies for easy integration. High precision with GPU-level accuracy and support for open-source LLMs like Qwen and LLaMA make it a robust choice. Techniques like Post Training Quantization and Flash Attention further boost performance, ensuring low latency and high throughput in multi-node server setups.
Project Details