JetMoE - JetMoE-8B: A Budget-Friendly Model Surpassing LLaMA2

Introduction to JetMoE: Cost-Effective Performance in AI Models

JetMoE is an innovative project in the field of Artificial Intelligence that aims to deliver high performance at a fraction of the usual cost. Achieving results comparable to Meta AI's LLaMA2-7B, JetMoE-8B stands out because it accomplishes this with less than $100,000 in training costs, challenging the assumption that training Large Language Models (LLMs) requires multi-million-dollar budgets.

Key Features of JetMoE

Cost Efficiency: JetMoE-8B is trained with approximately $0.08 million using a 96×H100 GPU cluster over two weeks. Despite this modest budget, it surpasses LLaMA2-7B in performance.
Open Source and Academic-Friendly: The model is entirely open-sourced and uses only publicly available datasets for its training. This eliminates the need for proprietary resources, making it highly accessible for academic institutions with limited budgets.
Low Computational Demand: JetMoE-8B incorporates only 2.2 billion active parameters during inference. This significantly reduces computational costs compared to other models, like Gemma-2B, while consistently delivering better performance.

Achievements

JetMoE-8B has managed to outperform some of the leading models in various benchmarks:

Performance Benchmarks: It has surpassed models like LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B in several key performance metrics such as ARC, Hellaswag, and MMLU.
MT-Bench Score: JetMoE-8B-chat, a specialized version of the model for dialogue, achieved a competitive score, surpassing models such as Llama-2-7b-chat and closely trailing behind major models like ChatGPT's versions.

Usage

JetMoE is not only cost-effective but also simple to use. Users can install and load the model using Python libraries. By following straightforward instructions involving standard tools like pip and transformers, anyone can integrate JetMoE into their applications or research projects.

from transformers import AutoTokenizer, AutoModelForCausalLM
from jetmoe import JetMoEForCausalLM

tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')

Collaboration Opportunities

JetMoE is developed by a team including Yikang Shen, Zhen Guo, Tianle Cai, and Zengyi Qin. The project welcomes collaboration, inviting researchers and developers with innovative ideas but limited resources to engage through MyShell.ai.

Conclusion

JetMoE represents a significant leap forward in making powerful AI tools accessible to broader communities. By drastically lowering the financial and computational barriers to entry, it paves the way for more widespread and democratized access to cutting-edge AI technology. For more detailed technical insights, users are encouraged to check the project’s technical report or visit their online platforms for access to demos and further resources.