Introduction to Baichuan-13B
Baichuan-13B is a large-scale open-source language model developed by Baichuan Intelligence, following the previously introduced Baichuan-7B model. With an impressive 130 billion parameters, Baichuan-13B is designed for commercial use and has achieved the best performance among models of its size on both Chinese and English benchmarks. The release includes two versions: a pre-trained model (Baichuan-13B-Base) and an alignment model (Baichuan-13B-Chat) that is capable of engaging in conversations. Here are the notable features of Baichuan-13B:
-
Larger Size and More Data: Building on Baichuan-7B, Baichuan-13B increases the parameter count to 130 billion. It was trained on an impressive 1.4 trillion tokens of high-quality data, 40% more than LLaMA-13B, making it the most data-rich open-source model of its size. The model supports both Chinese and English languages, utilizes ALiBi positional encoding, and has a context window length of 4096.
-
Simultaneous Release of Pre-trained and Alignment Models: The pre-trained model serves as a foundation for developers, while numerous users demand the alignment model with conversational capabilities. Therefore, both models are available, with Baichuan-13B-Chat providing strong dialogue abilities and easy deployment with minimal code.
-
Efficient Inference: To accommodate a broader user base, Baichuan-13B offers quantized versions in int8 and int4 that significantly reduce deployment resource requirements without compromising performance, allowing deployment on consumer-grade GPUs like the Nvidia 3090.
-
Open Source, Free, and Commercially Usable: Baichuan-13B is completely open for academic research, and developers can commercially utilize the model for free after obtaining official permission via email.
Benchmark Results
Baichuan-13B's capabilities have been evaluated using the 5-shot
method on various authoritative benchmarks for Chinese and English.
C-Eval
In the C-Eval benchmark, Baichuan-13B-Base and Baichuan-13B-Chat showcased superior performance in categories such as STEM, Social Sciences, Humanities, and Others, achieving average scores of 52.4 and 51.5, respectively.
MMLU
For the MMLU benchmark, both versions of Baichuan-13B excelled with notable scores, securing an average of 51.6 for Base and 52.1 for Chat, surpassing many of their counterparts.
CMMLU
Baichuan-13B-Base and Baichuan-13B-Chat dominated the CMMLU benchmark, which evaluates knowledge and reasoning abilities in a Chinese context, with averages of 55.3 and 55.8 respectively.
Model Details
The Baichuan-13B model boasts impressive specifications:
- Hidden Layer Dimension: 5120
- Number of Layers: 40
- Attention Heads: 40
- Vocabulary Size: 64,000
- Total Parameters: Approximately 13.26 billion
- Training Data: 1.4 trillion tokens
- Positional Encoding: ALiBi
- Maximum Sequence Length: 4096
Inference and Deployment
Model weights, source code, and configurations for inference have been released on Hugging Face, including the Baichuan-13B-Base and Baichuan-13B-Chat. Various inference options are available:
Using Python Code
Users can leverage Python to load and interact with the model by using a few lines of code to achieve natural language understanding tasks.
Command Line Tool
A command-line interface is available for simpler interactions, producing results efficiently.
Web Demo
Baichuan offers a web service demo via streamlit, allowing users to interact with the model through a browser-based interface by launching a local web service.
Baichuan-13B thus stands out as a robust, scalable, and user-friendly solution for those looking to harness the power of advanced language modeling for both research and commercial purposes.