Vicuna-LoRA-RLHF-PyTorch

Vicuna-LoRA-RLHF-PyTorch is a comprehensive solution for fine-tuning large language models (LLMs) on easily accessible hardware. This project leverages techniques such as Low-Rank Adaptation (LoRA) and Reinforcement Learning from Human Feedback (RLHF) to optimize and enhance the Vicuna LLM, making sophisticated machine learning models more accessible and practical for general use.

Environment Setup

To run the Vicuna-LoRA-RLHF-PyTorch project, you need a compatible environment set up. The requirements include a 2080Ti GPU with 12G memory, Torch 2.0.0, and CUDA 11.8. These specific versions ensure compatibility and optimal performance when training the models.

Todo List

Key tasks covered in this project are as follows:

Downloading Vicuna weights.
Supervised Fine-Tuning (SFT) of the model.
Merging the adapter into the model.
Implementing Reinforcement Learning from Human Feedback (RLHF), including reward model training and tuning with RL.

Run

The project involves several key steps to enhance the Vicuna LLM:

Download Vicuna Weights

Use the command line to apply a delta to the pre-existing LLama-7b model, transforming it into weights suitable for the Vicuna model. This process is crucial as the necessary weights for Vicuna are not readily available on community platforms like Hugging Face.

Supervised Fine-Tune

Before executing the fine-tuning, ensure the proper configurations are made in the project's scripts. This involves making specific changes to the script src/peft/utils/save_and_load.py and running a dedicated Python script to fine-tune the model on prepared data.

Merge PEFT Adapter into Model

This requires ensuring the correct version of the PEFT (Parameter-Efficiency Fine-Tuning) library is installed, as version inconsistencies can lead to errors. The fine-tuned adapter from the previous step is then merged into the model, making it ready for further steps.

Train Reward Model

Training a reward model is a part of the RLHF process. This involves complex interactions between various model parameters, ensuring that a model can effectively optimize its performance based on reward signals. It requires adjusting batch sizes and other parameters to fit your machine’s computational capabilities.

Merge Reward Adapter into Model

Once the reward model is trained, its adapter is then merged into the primary model. This step consolidates the learning and preparations from previous stages into the main structure of the Vicuna model, readying it for reinforcement learning adjustments.

Tuning LM with PPO

The final tuning step applies Proximal Policy Optimization (PPO) to adjust language model parameters further, based on delivered rewards. This method allows for continuous learning and optimization, significantly enhancing the language model's performance over time.

Topics

Some specific topics covered in the project include details about downloading Vicuna weights, potential challenges in PEFT version compatibility, and solutions for issues like CUDA out-of-memory errors, which can occur on less powerful hardware setups.

Reference

The project builds upon libraries and inspirations from various sources:

FastChat
Alpaca-LoRA
TRL by lvwerra
Llama-TRL by jasonvanf

These references provide community support and additional resources for users looking to extend their implementation or troubleshoot challenges.

Donation

If Vicuna-LoRA-RLHF-PyTorch has saved you time or added value to your work, the creators welcome donations, and QR codes for AliPay and WeChatPay are provided.

License

The project is available under the MIT License, allowing for extensive flexibility in use and adaptation by developers. See the [LICENSE] file for more details.