Project Icon

llama2.mojo

Boost Llama2 Inference Efficiency Using Mojo's SIMD and Vectorization

Product DescriptionThis project enhances the Llama2 model inference using Mojo's SIMD and vectorization, offering a 250x speed increase in Python performance. It exceeds llama2.c by 30% and llama.cpp by 20% in multithreaded CPU tasks. Supported models are Stories (260K to 110M) and Tinyllama-1.1B-Chat-v0.2. Benchmarks on Apple M1 Max demonstrate its proficiency. Suitable for developers exploring efficient transformer models in Mojo.
Project Details