Understanding Code Llama
Overview
Code Llama is a groundbreaking suite of advanced language models specifically designed for coding tasks. Building on the Llama 2 architecture, it delivers top-tier performance in the open-language model arena. With capabilities like code infilling, handling extensive input contexts, and following programming instructions without prior examples, Code Llama is a versatile tool suitable for a variety of applications.
There are several variations of Code Llama to cater to different needs: the foundational models (simply called Code Llama), Python-specific models (Code Llama - Python), and models optimized for instruction adherence (Code Llama - Instruct). Each of these categories includes models with parameter sizes of 7B, 13B, and 34B, ensuring they meet specific use-case requirements. The models are trained on sequences of up to 16,000 tokens and demonstrate marked improvement when working with inputs as large as 100,000 tokens. The 7B and 13B versions include support for infilling based on contextual content. These models have been refined from Llama 2 using a higher focus on code, all while incorporating robust safety measures.
Access and Utilization
With the release of the latest version, Code Llama is accessible to everyone—individuals, creators, researchers, and companies of all sizes—allowing them to innovate and scale their ideas responsibly. The release includes pre-trained and finely-tuned model weights, ranging from 7B to 34B parameters.
The repository serves as a simple template to load Code Llama models and execute tasks.
Downloading the Models
To download the model weights and tokenizers, one must visit the Meta website, request a download, and agree to the License. After approval, a link is sent via email. The correct URL starts with "https://download.llamameta.net." Required tools include wget
and md5sum
, and the download script is executed via:
bash download.sh
The models vary in size:
- 7B model is approximately 12.55GB
- 13B model is about 24GB
- 34B model requires around 63GB
Implementation
For setup, using a conda environment with PyTorch and CUDA, clone the repository and execute in the top-level directory:
pip install -e .
Model Utilization
Model-Parallel (MP) Values
Every model requires a different level of model parallelism:
- 7B model needs MP of 1
- 13B model needs MP of 2
- 34B model works with MP of 4
All models, except the 70B Python and Instruct versions, support up to 100,000 token sequences. However, cache allocation is dependent on max_seq_len
and max_batch_size
, optimized based on hardware and task needs.
Pretrained Code Models
The foundational Code Llama and Code Llama - Python models are not pre-configured to follow instructions but need prompting for expected results. Example commands with CodeLlama-7b
model entail:
torchrun --nproc_per_node 1 example_completion.py \
--ckpt_dir CodeLlama-7b/ \
--tokenizer_path CodeLlama-7b/tokenizer.model \
--max_seq_len 128 --max_batch_size 4
Code Infilling
Both Code Llama and Code Llama - Instruct models in the 7B and 13B groups can perform code infilling when provided with relevant context:
torchrun --nproc_per_node 1 example_infilling.py \
--ckpt_dir CodeLlama-7b/ \
--tokenizer_path CodeLlama-7b/tokenizer.model \
--max_seq_len 192 --max_batch_size 4
Fine-Tuning Instructions
The Code Llama - Instruct variants are tweaked for instruction adherence. Before running, there are specific formatting requirements in the chat_completion()
function, including certain tags and token placements. This applies for models up to 34B, while the 70B variant uses a unique prompt format. You can use functions directly for output formatting.
torchrun --nproc_per_node 1 example_instructions.py \
--ckpt_dir CodeLlama-7b-Instruct/ \
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
--max_seq_len 512 --max_batch_size 4
Responsible Use
Code Llama presents potential risks; thus, users are encouraged to consult the Responsible Use Guide. For broader inquiries on the project, its safety, or related concerns, Meta provides several channels for reporting issues, which are accessible via specified links.
Conclusion
Code Llama enriches the landscape of language models with its specialized focus on coding. It empowers various sectors with capabilities to innovate responsibly, supported by detailed guidelines for safe and effective utilization.