Llama3-Chinese Project Overview
Introduction
Llama3-Chinese is an advanced language model that is designed to process and understand Mandarin Chinese and English text efficiently. It's built upon the foundation of the Meta-Llama-3-8B base model and enhanced using sophisticated training techniques such as DORA and LORA+. The model has been trained with an extensive dataset comprising 500,000 high-quality Chinese multi-turn dialogues, 100,000 English multi-turn dialogues, and 2,000 instances of single-turn self-cognition data.
The project's GitHub repository can be accessed here.
Model Download
There are multiple versions of the Llama3-Chinese model available for download, ensuring users can choose the one that best fits their needs. Below are the available models:
- Meta-Llama-3-8B: Available on both HuggingFace and ModelScope
- Llama3-Chinese-Lora: Accessible via HuggingFace and ModelScope
- Llama3-Chinese (merged model): Can be downloaded from HuggingFace and ModelScope
Merging LORA Model
Users may find it beneficial to merge specific models for enhanced performance. Here is a step-by-step guide on how to merge the LORA model:
-
Download Meta-Llama-3-8B from ModelScope.
git clone https://www.modelscope.cn/LLM-Research/Meta-Llama-3-8B.git
-
Download Llama3-Chinese-Lora from either ModelScope or HuggingFace.
# From ModelScope git lfs install git clone https://www.modelscope.cn/seanzhang/Llama3-Chinese-Lora.git # From HuggingFace git lfs install git clone https://huggingface.co/zhichen/Llama3-Chinese-Lora
-
Merge the Models using the provided script.
python merge_lora.py \ --base_model path/to/Meta-Llama-3-8B \ --lora_model path/to/lora/Llama3-Chinese-Lora \ --output_dir ./Llama3-Chinese
Inference and Demonstration
Once the model is set up, you can conduct inferences or run demonstrations using Python scripts. Below are examples of how to run models either in command-line or web environments.
Running an Inference
With a Python script, utilize transformers
to tokenize and generate responses:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "zhichen/Llama3-Chinese"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "你好"},
]
input_ids = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=2048,
do_sample=True,
temperature=0.7,
top_p=0.95,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
Command Line and Web Interface Demos
-
CLI Demo:
python cli_demo.py --model_path zhichen/Llama3-Chinese
-
Web Demo:
python web_demo.py --model_path zhichen/Llama3-Chinese
Using VLLM for Deployment
To deploy the model with VLLM, follow these steps:
-
Use VLLM to initialize the model.
python -m vllm.entrypoints.openai.api_server --served-model-name Llama3-Chinese --model ./Llama3-Chinese
-
Execute this command for the VLLM web demonstration:
python vllm_web_demo.py --model Llama3-Chinese
Datasets and Licensing
The training data set, deepctrl-sft-data, is available on ModelScope.
The project is available under the Apache License 2.0, allowing free commercial use for the code but restricts the model weights and data to research purposes only. More details can be found in the DISCLAIMER.
Acknowledgements and Citation
Llama3-Chinese credits contributions from projects like meta-llama/llama3 and hiyouga/LLaMA-Factory.
For academic references, please use the following citation:
@misc{Llama3-Chinese,
title={Llama3-Chinese},
author={Zhichen Zhang, Xin LU, Long Chen},
year={2024},
howpublished={\url{https://github.com/seanzhang-zhichen/llama3-chinese}},
}
This comprehensive summary of the Llama3-Chinese project provides all the essential details for someone to understand, download, utilize, and contribute to this remarkable language model.