ChatGLM-LoRA-RLHF-PyTorch
ChatGLM-LoRA-RLHF-PyTorch is a comprehensive framework designed to fine-tune the ChatGLM large language model (LLM) using low-rank adaptation (LoRA) and reinforcement learning from human feedback (RLHF) on consumer-level hardware. This project offers a step-by-step process to enhance language models effectively, even without access to high-end resources.
Environment Setup
To get started with the project, the recommended environment configuration includes:
- A GPU like the NVIDIA 2080Ti with 12GB of memory.
- PyTorch version 2.0.0.
- CUDA version 11.8 for compatible GPU operations.
Todo List
The project has outlined several key tasks:
- Supervised Fine-tuning (SFT): This step has been completed, integrating supervised data to refine the model.
- Merge Adapter into Model: This involves incorporating learned parameters into the model itself.
- RLHF: While training for reward modeling has been accomplished, further tuning with reinforcement learning is yet to be completed.
Run
Data Process
The data processing phase starts by converting the Alpaca dataset into a JSON Lines format. This is achieved with the following command:
python cover_alpaca2jsonl.py --data_path data/alpaca_data.json --save_path data/alpaca_data.jsonl
After conversion, tokenization of the dataset is necessary to prepare it for training:
python tokenize_dataset_rows.py --jsonl_path data/alpaca_data.jsonl --save_path data/alpaca --max_seq_length 200 --skip_overlength True
Supervised Finetune
The project stresses the importance of using the latest version of the Parameter-Efficient Fine-Tuning (PEFT) library for optimal performance. Follow these commands to perform supervised fine-tuning:
pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git # Latest version >=0.3.0.dev0
Then run the finetuning script:
python supervised_finetune.py --dataset_path data/alpaca --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 32 --save_steps 200 --save_total_limit 3 --learning_rate 1e-4 --fp16 --remove_unused_columns false --logging_steps 10 --output_dir output
Merge PEFT adapter into Model
To integrate the PEFT adapter into the model, ensure PEFT version 0.2.0 is installed since higher versions might cause errors:
pip uninstall peft -y
pip install peft==0.2.0
python merge_peft_adapter.py --model_name ./output
Reward Modeling
For reward modeling—a crucial step in RLHF—the project provides the following training command:
python train_reward_model.py --model_name 'THUDM/chatglm-6b' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False
Merge reward model into Model
After training, merge the reward model into the main model setup:
python merge_peft_adapter.py --model_name ./reward_model_chatglm-6b
Notes
- Ensure PEFT is installed correctly and reverted to version 0.2.0 if necessary due to compatibility.
- Acquire ChatGLM source code from its hub and integrate it within the local directory models for usage as huggingface transformers don't support ChatGLM yet.
- Always load models with
trust_remote_code=True
due to the customization of ChatGLM model code. - Implementation of a reward model is self-contained and documented in reward_model.py.
Reference
The project builds on multiple references for data preprocessing and environment setup:
Star-History
The project's star history reflects its growth and interest over time:
Donation
If this project has been helpful and saved you development time, consider supporting with a donation via AliPay or WeChatPay.
License
The project is licensed under the MIT License © Kun, offering flexibility in its use and distribution.