safe-rlhf
Explore an open-source framework for language model training emphasizing safety and alignment using Safe RLHF methods. It supports leading pre-trained models, extensive datasets, and customizable training. Features include multi-scale safety metrics and thorough evaluation, assisting researchers in optimizing models with reduced risks. Developed by the PKU-Alignment team at Peking University.