Introduction to the YAYI Large Model
YAYI, by Wenge Research, is an advanced large-scale model fine-tuned on millions of high-quality structured datasets across diverse fields such as media outreach, public opinion analysis, public safety, financial risk control, and urban governance. Through comprehensive pre-training and subsequent iteration, YAYI's foundational and analytical capabilities in the Chinese language have been progressively enhanced. Moreover, the model incorporates multi-turn dialogue and plugin capabilities. Continuous feedback from hundreds of users during internal tests has further optimized its performance and safety.
The open-source release of the YAYI model seeks to bolster the open-source community within the Chinese pre-training model landscape, encouraging collaboration and ecosystem growth among its partners.
Model Release
Available Models
-
YAYI-7B:
- Model Identifier:
wenge-research/yayi-7b
- Download
- Model Identifier:
-
YAYI-7B-Llama2:
- Model Identifier:
wenge-research/yayi-7b-llama2
- Download
- Model Identifier:
-
YAYI-13B-Llama2:
- Model Identifier:
wenge-research/yayi-13b-llama2
- Download
- Model Identifier:
Model Deployment
Environment Setup
-
Clone the repository to your server:
git clone https://github.com/wenge-research/YAYI.git cd YAYI
-
Create a conda environment:
conda create --name yayi python=3.8 conda activate yayi
-
Install dependencies:
pip install -r requirements.txt
Inference
The model weights for the yayi-7b
version are available in the Huggingface model repository. Here’s a code snippet to initiate inference, able to run on a single GPU such as A100/A800/3090, with a memory usage of approximately 20GB when using FP16 precision:
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
yayi_7b_path = "wenge-research/yayi-7b"
tokenizer = AutoTokenizer.from_pretrained(yayi_7b_path)
model = AutoModelForCausalLM.from_pretrained(yayi_7b_path, device_map="auto", torch_dtype=torch.bfloat16)
prompt = "Hello"
formatted_prompt = f"<|System|>:\nA chat between a human and an AI assistant named YaYi.\nYaYi is a helpful and harmless language model developed by Beijing Wenge Technology Co.,Ltd.\n\n<|Human|>:\n{prompt}\n\n<|YaYi|>:"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
eos_token_id = tokenizer("<|End|>").input_ids[0]
generation_config = GenerationConfig(
eos_token_id=eos_token_id,
pad_token_id=eos_token_id,
do_sample=True,
max_new_tokens=100,
temperature=0.3,
repetition_penalty=1.1,
no_repeat_ngram_size=0
)
response = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(response[0]))
Fine-Tuning
The project leverages the deepspeed
framework for model training. Once the environment is configured, the relevant script can be executed to start training. The system supports instruction data full-parameter fine-tuning, LoRA fine-tuning, and multi-round dialogue data fine-tuning.
-
Instruction Data Full-Parameter Fine-Tuning:
Utilizes JSON formatted data with fields for "instruction", "input", and "output". Executing the specified command initiates the fine-tuning process with support for multi-GPU configurations.
-
LoRA Fine-Tuning:
This method provides a resource-efficient tuning approach using single GPU setups and facilitates high parameter models training by optimizing the
lora-dim
andlora-module-name
settings. -
Multi-Round Dialogue Data Fine-Tuning:
Similarly organized in JSON format, this method focuses on multi-turn conversation data and supports efficient training using multiple GPUs.
Training Dataset
YAYI is trained on a dataset of hundreds of thousands of field-specific instructions, primarily encompassing areas like finance, security, public opinion, and media, including safety-enhanced and plugin capability data.
Licensing & Limitations
Limitations
Current limitations of YAYI include possible inaccuracies in fact-based instructions, inadequately identification of harmful instructions, and underperformance in logic reasoning, code generation, and scientific calculations.
Disclaimer
The YAYI model is open-sourced for research purposes only and must not be used for commercial purposes or any activities that might harm society. Free access to the code, data, and models associated with the YAYI project is available under Apache-2.0, CC BY-NC 4.0, and a specific Model License.
Acknowledgments
The YAYI initiative utilizes components from BigScience bloomz-7b1-mt and Meta Llama 2, as well as training codebases such as Databricks’ dolly and Huggingface transformers, alongside distributed training tools like Microsoft’s DeepSpeed.