slowllama
Explore how slowllama facilitates fine-tuning of Llama2 and CodeLLama models on Apple M1/M2 and nVidia GPUs without quantization. Learn about SSD and RAM offloading for efficient model management, focused exclusively on fine-tuning using LoRA, ensuring effective parameter updates on consumer-grade hardware. Review experimental results to understand GPU and memory optimization for large model fine-tuning.