mlx-llm
Explore real-time deployment of Large Language Models on Apple Silicon using MLX. Access a broad spectrum of models like LLaMA and Phi3, and leverage model quantization and embedding extraction for enhanced efficiency. Suitable for developers aiming to optimize LLMs on Apple devices or investigate fine-tuning with LoRA and RAG features.