InternLM-Math: A Pioneer in Math Reasoning
InternLM-Math is a cutting-edge project that has introduced a series of advanced bilingual open-source models specifically designed for mathematical reasoning. These models serve multiple purposes: as solvers, provers, verifiers, and augmentors, making them versatile tools in the realm of artificial intelligence-driven math problem-solving. The project is notable for its advancements in both formal and informal mathematical reasoning capabilities.
Recent Developments and News
-
InternLM2.5-Step-Prover Models (October 2024): Released with 14,000 proofs discovered using the Lean-Workbook, these models enhance automated theorem proving.
-
Lean-Github and InternLM2-Step-Prover (July 2024): Introductions from this period include 29,000 compiled theorems from Lean 4 repositories, significantly improving performance metrics on key mathematical benchmarks.
-
Lean-Workbook Dataset (June 2024): A significant dataset with 57,000 math problems, formalized in Lean 4, was released, alongside tools for auto-proofing.
-
InternLM2-Math-Plus (May 2024): An updated model series featuring various sizes (1.8B, 7B, 20B, and 8x22B parameters) with impressive performances in both informal and formal mathematical reasoning tasks.
Key Features
-
Enhanced Pre-training: InternLM Math models are further pre-trained on approximately 100 billion high-quality math-related data tokens and fine-tuned with about 2 million bilingual supervised data sets related to math, providing a strong foundation for tackling diverse math problems.
-
Lean Integration for Formal Math: Lean, a formal proof language, is integrated as a support language, enabling the models to formalize math problems and aid in theorem-proofing tasks.
-
Multifaceted Reward Model: The models implement a comprehensive reward system that facilitates both outcome and process evaluations, including the conversion of thought processes into Lean code.
-
Math Augmentation and Code Interpretation: InternLM-Math models are equipped to augment math problems and use a code interpreter for accelerated data synthesis, showcasing flexibility in problem-solving.
Performance in Math Reasoning
Formal Math Reasoning
InternLM2-Math-Plus models show remarkable improvements on benchmarks such as MiniF2F-test, leaving a significant impact on the domain of formal math. The 7B model excelled with a score of 43.4, leading the way among peers.
Informal Math Reasoning
In informal settings, like the MATH and GSM8K benchmarks, InternLM2-Math models show impressive performances. For example, the 1.8B model outperforms MiniCPM-2B, while the larger Mixtral8x22B model leads with outstanding scores such as 68.5 in MATH when utilizing Python.
A Comparison in Performance on MathBench-A
InternLM-Math also holds its ground on the MathBench-A benchmarks, which range from arithmetic to college-level mathematics. The Mixtral8x22B variant presents competitive results akin to high-performing language models in the field.
Conclusion
InternLM-Math represents a leap forward in integrating comprehensive language models for both formal and informal math reasoning. By continually refining and updating their models, the InternLM team remains at the forefront of computing innovations in mathematical problem-solving. The comprehensive series of releases and updates mark InternLM-Math as an indispensable tool, both for practicing mathematicians and researchers leveraging AI in math-focused domains.