ChatGLM-LoRA-RLHF-PyTorch
This project details a complete process for tuning the ChatGLM large language model through LoRA and Reinforcement Learning with Human Feedback (RLHF) on accessible hardware. It covers data processing, supervised fine-tuning, and reward modeling. The guide also addresses effective PEFT version utilization for model integration, overcoming Hugging Face transformer compatibility challenges. This enables efficient model development and tuning, specifically for those working with constrained resources.