Project Icon

safe-rlhf

Safe RLHF Framework for Enhanced AI Model Alignment

Product DescriptionExplore an open-source framework for language model training emphasizing safety and alignment using Safe RLHF methods. It supports leading pre-trained models, extensive datasets, and customizable training. Features include multi-scale safety metrics and thorough evaluation, assisting researchers in optimizing models with reduced risks. Developed by the PKU-Alignment team at Peking University.
Project Details