phi3-Chinese - Versatile Phi3 Model: Compact and Suitable for Mobile Deployments

Introduction to the phi3-Chinese Project

The phi3-Chinese project is an innovative initiative designed to leverage compact technology to outperform larger models. Specifically, phi3 displays impressive capabilities by surpassing the performance of the llama3 8b model using less than half its volume, featuring a size of just 3.8 billion parameters. This breakthrough significantly enhances the feasibility of deploying advanced AI models on mobile devices.

This repository is dedicated to collating various training variations of phi3 that are scattered across the open-source community. The objective is to help more users discover unique and fascinating weights that may otherwise go unnoticed. Additionally, the project offers simplified tutorials on phi-related training, inference, and deployment processes to assist users in maximizing the potential of these models.

Chat Model Downloads

Phi-3-Chinese Models

Phi-3-mini-128k-instruct-Chinese
- Incremental SFT Version:
  - Modelscope
- Direct DPO Version:
  - Modelscope
- Extended Vocabulary Version: Planned for future release

Hugging Face Models (Original English Version)

ModelScope Models (Original English Version)

Web Page Deployment

To deploy phi3-Chinese models on a webpage, users can use the following command:

streamlit run deploy/streamlit_for_instruct.py ./Phi-3-mini-128k-instruct-Chinese

An illustrative deployment image is shown below: Deployment Preview

Current Issues

Despite the promising technology, several challenges remain:

Performance Discrepancies: Although theoretically sound, in practice, the performance of the phi3-mini does not fully match the high expectations, particularly in English. There are suspicions that score polishing might be involved, hindering its usability. Exploring block stacking operations might unlock its potential.
Limited Vocabulary: The model's vocabulary is quite limited, notably in Chinese tokens. Frequently, multiple tokens are used to represent a single Chinese character, thus reducing its overall efficiency, even more so than the llama3 8b model. This calls for vocabulary expansion and incremental pre-training.

Despite these concerns, the phi3-mini 3.8b version has vast potential for lightweight downstream vertical tasks. While it may not reach the level of GPT-3.5, creating a mixture-of-experts (MOE) version might significantly boost its performance.