mistral.rs
Mistral.rs facilitates fast LLM inference with advanced quantization and optimized processing for various devices, including Apple silicon, CPUs, and CUDA. It allows integration through Rust and Python APIs and supports an OpenAI-compatible server for multi-platform deployment. Features include efficient MoE models, direct quantization, and sampling techniques to improve machine learning workflows. Access prequantized models and employ prompt chunking and adaptable LoRA adapters for enhanced efficiency.