Project Icon

grok-1

Accurate JAX Example Code for the Grok-1 Model with 314B Parameters

Product DescriptionThis repository offers JAX example code to run the Grok-1 model, characterized by its Mixture of 8 Experts architecture and 314 billion parameters. The process requires significant GPU memory and involves 64 layers and 48 attention heads. The SentencePiece tokenizer encompasses 131,072 tokens with features like rotary embeddings, activation sharding, and 8-bit quantization for sequences up to 8,192 tokens. Weights can be downloaded through a magnet link or the HuggingFace Hub under the Apache 2.0 license.
Project Details