Introduction to the Huozi Project
The Huozi Project, developed by the Harbin Institute of Technology Social Computing and Information Retrieval Research Center (HIT-SCIR), presents a generalized language model designed to significantly enhance the capabilities of natural language processing tasks. It offers a multitude of possibilities and choices for both research and practical applications, tapping into the latest advancements in large language models.
Model Updates and Evolutions
The Huozi model has seen several iterations, with the most recent version being Huozi 3.5. This version builds upon its predecessors, Huozi 3.0 and the Chinese-Mixtral-8x7B, to enhance performance. Notably, Huozi 3.5 sustains a 32K-long context, embodies robust multilingual knowledge, excels in mathematical reasoning and code generation, and improves on command-following capabilities and content safety.
Model Architecture
Huozi 3.5 is characterized as a sparse mixture of experts (SMoE) model. It consists of layers with eight feed-forward networks (FFNs) each, but utilizes a top-2 sparse activation technique during forward computations. With a total of 46.7 billion parameters, this design enables it to activate only about 13 billion parameters during inference, thereby optimizing computational efficiency and processing speed.
Training Process
The development of Huozi 3.5 involves a rigorous multi-step training process:
-
Chinese Vocabulary Pre-training: Initially, the model overcame Chinese vocabulary limitations by expanding the vocabulary of the base Mixtral-8x7B model, enhancing its capabilities in Chinese scenarios.
-
Training Huozi 3.0: Built upon the enhanced Mixtral-8x7B model, Huozi 3.0 was fine-tuned on a dataset of approximately 300,000 command datasets, augmenting its performance in tasks like mathematical reasoning and code generation.
-
Intermediate Checkpoint 1: Using the data from Huozi 1.0, this checkpoint exhibited superior results in English and Chinese knowledge assessments but fell short in command-following and security.
-
Enhanced Command-Following Abilities: Additional datasets and training techniques like BPE Dropout were employed to bolster command-following capabilities, leading to Intermediate Checkpoint 2.
-
Model Fusion: Combining insights from different models, including Intermediate Checkpoint 1, Intermediate Checkpoint 2, and Huozi 3.0, led to the synthesis of Intermediate Checkpoint 3.
-
Post-Fusion Training: Final adjustments and command fine-tunings post-fusion brought forth the Huozi 3.5 model, advancing its overall proficiency across various dimensions.
Model Availability and Deployment
The Huozi models are available for download, with multiple checkpoints providing flexibility for research and application. Huozi 3.5 supports deployment across various frameworks and platforms such as Transformers, vLLM, and others, and can even be configured to work with OpenAI API protocols for broader accessibility.
Cautionary Note
It is imperative to handle the outcomes from the Huozi models judiciously as they may occasionally yield factual errors or contain biased content. Users should carefully evaluate and validate model-generated content before utilization or distribution.
With its innovative approach and sustained development efforts, the Huozi Project aims to further extend the capabilities of large-scale language models, contributing significantly to the fields of natural language processing and artificial intelligence at large.