Introduction to LLaMA Pro: Progressive LLaMA with Block Expansion
Overview
LLaMA-Pro, short for Progressive LLaMA with Block Expansion, represents an innovative leap in the field of artificial intelligence and machine learning. Released by the researchers at Tencent ARC, LLaMA-Pro is an advanced model that enhances the capabilities of previous LLaMA iterations through block expansion—a technique designed to improve computational efficiency and performance.
Recent Developments
The project has undergone significant advancements, as evidenced by several key updates:
- Open-Source Release: As of January 6, 2024, LLaMA-Pro was made publicly accessible via its GitHub repository and its Demo & Model.
- Local Demo with Gradio: On January 7, 2024, the team provided instructions for running a local Gradio demo, allowing users to easily experiment with the model's capabilities.
- Training Code Availability: Training code was added to the open-instruct platform on January 18, 2024, facilitating further development and experimentation by users.
- Mistral-Pro-8B-v0.1 Release: February 23, 2024, marked the release of the Mistral-Pro-8B-v0.1 model, demonstrating superior performance across various benchmarks.
- MetaMath-Mistral-Pro Release: The same day, the MetaMath-Mistral-Pro model surpassed previous 7B models in benchmarks such as GSM8k and MATH.
- Cosmopedia Integration: On May 8, 2024, a pre-train example script was provided for Cosmopedia, enriching the resources available to users.
- Conference Recognition: LLaMA Pro's paper was accepted to the main conference of ACL 2024, indicating significant academic recognition.
Performance Highlights
LLaMA-Pro and its associated models achieve remarkable results in mathematical reasoning and language understanding benchmarks. In high-performance benchmarks like GSM8k and MATH Pass@1:
- MetaMath-Mistral-Pro achieved the highest scores, with GSM8k Pass@1 at 78.4 and MATH Pass@1 at 30.3.
Appreciation and Acknowledgements
The team expresses gratitude to collaborators and hosting partners such as Hugging Face and WiseModel for their support in sharing the model and resources with a broader audience. Additionally, the foundation of instruction tuning relies on the groundwork laid by the open-instruct implementation.
Citation
The progress and expansion of LLaMA-Pro have been documented in a published paper. For those who leverage this resource in their work, it is encouraged to cite:
@article{wu2024llama,
title={Llama pro: Progressive llama with block expansion},
author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},
journal={arXiv preprint arXiv:2401.02415},
year={2024}
}
Conclusion
LLaMA-Pro stands as a beacon of innovation in machine learning, underpinned by a commitment to open-source development and continual improvement through community collaboration. Its advancements in efficiency and performance set a new standard for future projects in this dynamic field.