ESFT - Enhancing Large Language Models through Efficient MoE-Based Customization

Expert-Specialized Fine-Tuning (ESFT)

The Expert-Specialized Fine-Tuning (ESFT) project focuses on enhancing the efficiency of Large Language Models (LLMs) using a Mixture-of-Experts (MoE) architecture. This approach aims to tailor these complex models to perform specific tasks by adjusting only the parts relevant to those tasks. This method not only improves the model's efficiency and performance but also reduces the need for extensive resources and storage.

Key Developments

EMNLP 2024 Acceptance: ESFT has been recognized and accepted for presentation at the EMNLP 2024 Main Conference.
Release of Training Code: The team has made the ESFT training code available, allowing users to experiment with their own models and datasets.

Getting Started

To get started with ESFT, you can quickly set up the environment by following these steps:

Cloning the Repository:

git clone https://github.com/deepseek-ai/ESFT.git
cd esft

Installing Required Dependencies:

pip install transformers torch safetensors accelerate

Downloading Necessary Adapters:
```
bash scripts/download_adapters.sh
```

Key Scripts and Their Functions

eval_multigpu.py: This script is used to assess the model's performance across various datasets. It can be configured in detail via the scripts/eval.sh file.
get_expert_scores.py: This script calculates scores for each expert based on evaluation datasets, aiding in understanding the efficacy of different model components.
generate_expert_config.py: This script generates configurations that help in tailoring the MoE model to perform specific tasks more efficiently based on the evaluation scores.
train.py and train_ep.py: These scripts are essential for training the model using the configurations generated. The 'train_ep.py' script is optimized for multi-GPU training, utilizing expert parallelism to enhance training performance.

Contact and Support

For any issues like bug reports or feature requests, users are encouraged to open issues on the ESFT GitHub page. Providing detailed information will help the team to resolve issues more effectively.

Future Plans

The team has set goals to continually update models, evaluation scripts, and training methods. They are also planning more enhancements to further improve the system.

How to Cite

If the ESFT project or paper proves useful in your work, you are encouraged to cite it as follows:

@article{wang2024letexpertsticklast,
      title={Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models}, 
      author={Zihan Wang and Deli Chen and Damai Dai and Runxin Xu and Zhuoshu Li and Y. Wu},
      year={2024},
      eprint={2407.01906},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.01906}, 
}

The ESFT project represents a significant advancement in efficiently customizing large-scale language models, balancing performance with resource optimization. With its already impressive achievements and future plans, ESFT is poised to make a substantial impact in the field of natural language processing.