Introduction to Gemma in PyTorch
Gemma stands for a collection of advanced yet lightweight open models, designed with inspiration from the technology behind Google's Gemini models. These models are text-to-text and are structured as decoder-only large language models. They are available in English and come with open weights, which means their internal workings are accessible for further development and study. There are different versions of these models, including both pre-trained models and models fine-tuned for specific instructions.
For individuals seeking further details, the following resources provide comprehensive information:
PyTorch Implementation
The Gemma models are officially implemented in PyTorch, which is a popular deep learning framework. This implementation supports both PyTorch and its extension, PyTorch/XLA, allowing users to run inference—the process of making predictions using a trained model—on different hardware such as CPUs, GPUs, and TPUs.
Model Updates
The Gemma project provides regular updates to its models, ensuring they remain cutting-edge:
- June 26th: Introduction of Gemma v2. Details and model checkpoints can be accessed on Kaggle and Hugging Face.
- April 9th: Launch of CodeGemma. Checkpoints are available on Kaggle and Hugging Face.
- April 5th: Release of Gemma v1.1. Access the checkpoints on Kaggle and Hugging Face.
Downloading Gemma Checkpoints
Model checkpoints can be essential for reproducing results or building upon existing models. They are available on Kaggle and on the Hugging Face Hub:
- Kaggle: Gemma Model Checkpoints
- Hugging Face: Gemma Models
Users can choose from various Gemma model variants, including options like 2B, 2B V2, 7B, 7B int8 quantized, 9B, and 27B.
Experimenting with Gemma
Free Online Access on Colab
Users can easily try out the Gemma models on Google Colab by following instructions here.
Local Usage via PyTorch
To use Gemma models locally, users should first ensure they have configured docker permissions:
sudo usermod -aG docker $USER
newgrp docker
Running Inference:
-
On CPU:
PROMPT="The meaning of life is" docker run -t --rm \ -v ${CKPT_PATH}:/tmp/ckpt \ ${DOCKER_URI} \ python scripts/run.py \ --ckpt=/tmp/ckpt \ --variant="${VARIANT}" \ --prompt="${PROMPT}"
-
On GPU:
PROMPT="The meaning of life is" docker run -t --rm \ --gpus all \ -v ${CKPT_PATH}:/tmp/ckpt \ ${DOCKER_URI} \ python scripts/run.py \ --device=cuda \ --ckpt=/tmp/ckpt \ --variant="${VARIANT}" \ --prompt="${PROMPT}"
Using PyTorch/XLA
This extension is used for running models on TPU or CPU with specialized docker images:
-
Build Docker Image:
DOCKER_URI=gemma_xla:${USER} docker build -f docker/xla.Dockerfile ./ -t ${DOCKER_URI}
-
Run Inference on TPU:
docker run -t --rm \ --shm-size 4gb \ -e PJRT_DEVICE=TPU \ -v ${CKPT_PATH}:/tmp/ckpt \ ${DOCKER_URI} \ python scripts/run_xla.py \ --ckpt=/tmp/ckpt \ --variant="${VARIANT}"
Tokenizer Considerations
The model's tokenizer reserves 99 unused tokens to facilitate training and fine-tuning. These are denoted as <unused[0-98]>
and have token IDs ranging from [7-105]
.
Conclusion
By offering robust and flexible models integrated into PyTorch, Gemma in PyTorch serves as an accessible yet powerful tool for natural language processing tasks. Whether users seek to explore existing models or leverage the technology for new developments, Gemma offers a versatile platform for innovation in language modeling.