Project Icon

Online-RLHF

Online RLHF: A Novel Approach to Aligning Large Language Models with Human Feedback

Product DescriptionThis project offers a detailed guide to Online Iterative RLHF, a cutting-edge method proven more effective than offline methods. The open-source workflow allows reproduction of advanced LLMs using only open-source data, achieving results on par with or better than LLaMA3-8B-instruct. It includes comprehensive setup instructions covering fine-tuning, reward modeling, data generation, and iterative training.
Project Details