ChatGLM-LoRA-RLHF-PyTorch

ChatGLM-LoRA-RLHF-PyTorch is a comprehensive framework designed to fine-tune the ChatGLM large language model (LLM) using low-rank adaptation (LoRA) and reinforcement learning from human feedback (RLHF) on consumer-level hardware. This project offers a step-by-step process to enhance language models effectively, even without access to high-end resources.

Environment Setup

To get started with the project, the recommended environment configuration includes:

A GPU like the NVIDIA 2080Ti with 12GB of memory.
PyTorch version 2.0.0.
CUDA version 11.8 for compatible GPU operations.

Todo List

The project has outlined several key tasks:

Supervised Fine-tuning (SFT): This step has been completed, integrating supervised data to refine the model.
Merge Adapter into Model: This involves incorporating learned parameters into the model itself.
RLHF: While training for reward modeling has been accomplished, further tuning with reinforcement learning is yet to be completed.

Run

Data Process

The data processing phase starts by converting the Alpaca dataset into a JSON Lines format. This is achieved with the following command:

python cover_alpaca2jsonl.py --data_path data/alpaca_data.json --save_path data/alpaca_data.jsonl

After conversion, tokenization of the dataset is necessary to prepare it for training:

python tokenize_dataset_rows.py --jsonl_path data/alpaca_data.jsonl --save_path data/alpaca --max_seq_length 200 --skip_overlength True

Supervised Finetune

The project stresses the importance of using the latest version of the Parameter-Efficient Fine-Tuning (PEFT) library for optimal performance. Follow these commands to perform supervised fine-tuning:

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git  # Latest version >=0.3.0.dev0

Then run the finetuning script:

python supervised_finetune.py --dataset_path data/alpaca --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 32 --save_steps 200 --save_total_limit 3  --learning_rate 1e-4 --fp16 --remove_unused_columns false --logging_steps 10 --output_dir output

Merge PEFT adapter into Model

To integrate the PEFT adapter into the model, ensure PEFT version 0.2.0 is installed since higher versions might cause errors:

pip uninstall peft -y
pip install peft==0.2.0
python merge_peft_adapter.py --model_name ./output

Reward Modeling

For reward modeling—a crucial step in RLHF—the project provides the following training command:

python train_reward_model.py --model_name 'THUDM/chatglm-6b' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False

Merge reward model into Model

After training, merge the reward model into the main model setup:

python merge_peft_adapter.py --model_name ./reward_model_chatglm-6b

Notes

Ensure PEFT is installed correctly and reverted to version 0.2.0 if necessary due to compatibility.
Acquire ChatGLM source code from its hub and integrate it within the local directory models for usage as huggingface transformers don't support ChatGLM yet.
Always load models with trust_remote_code=True due to the customization of ChatGLM model code.
Implementation of a reward model is self-contained and documented in reward_model.py.

Reference

The project builds on multiple references for data preprocessing and environment setup:

Star-History

The project's star history reflects its growth and interest over time:

Donation

If this project has been helpful and saved you development time, consider supporting with a donation via AliPay or WeChatPay.

ChatGLM-LoRA-RLHF-PyTorch