SPIN
SPIN uses a self-play mechanism to improve language models, enabling self-enhancement through iteration. It generates training data from past iterations to refine model strategies and excels over models trained via direct preference optimization. SPIN achieves enhancements without needing extra human-annotated data beyond what's required for supervised fine-tuning. The method is theoretically grounded and validated on multiple benchmarks, ensuring data distribution alignment. Detailed setup guides and open-source availability aid replication and further exploration.