Online-RLHF
This project offers a detailed guide to Online Iterative RLHF, a cutting-edge method proven more effective than offline methods. The open-source workflow allows reproduction of advanced LLMs using only open-source data, achieving results on par with or better than LLaMA3-8B-instruct. It includes comprehensive setup instructions covering fine-tuning, reward modeling, data generation, and iterative training.