PULSE - Advanced NLP for Chinese Medical Fields Including Health Education and Diagnosis Support

Introducing the PULSE Project

PULSE is an innovative project that stands out in the field of medical AI with its specialized focus on Chinese medical language models. It represents a significant advancement in natural language processing (NLP) for healthcare applications and is designed to address various tasks within the medical domain.

Main Features of the PULSE Model

PULSE is a Chinese medical large language model that has undergone extensive training. The model has been fine-tuned using approximately 4 million instructions from both medical and general Chinese domains. This comprehensive training allows PULSE to excel at numerous medical NLP tasks, which include:

Health education
Passing medical exams
Evaluating and interpreting medical reports
Structuring medical records
Simulating diagnostic and treatment processes

Model Availability

PULSE is available in different model sizes, which cater to various needs:

PULSE-7b is built on the bloomz-7b1-mt architecture.
PULSE-20b utilizes the InternLM-20B base for fine-tuning.

Moreover, a quantized version of the model will soon be available, catering to those needing larger parameter models. The team encourages partnerships for further expansion.

Limitations

Despite its powerful capabilities, PULSE is primarily for research use in the medical field. Its results should not replace professional medical advice, as the model's information may not always be complete or fully accurate. Users are advised to consult professional healthcare providers for diagnosis and treatment.

Elo Evaluation

The performance of PULSE is evaluated using the Elo Rating tournament method, which helps balance the cost of model assessment. PULSE has shown competitive results in several benchmark datasets, including MedQA and PromptCBLUE, among others.

Additional Resources

For further model fine-tuning, PULSE integrates capabilities from the LLaMA-Factory project, and guidance is available for quantizing the model through LMDeploy.

Inference Requirements

To deploy PULSE locally for inference with a batch size of 1, users need the following hardware capacities:

7B Model: 14GB for FP16 or 6GB for INT4
20B Model: 40GB for FP16 or 12GB for INT4

Download and Installation

To begin using PULSE, follow these steps:

Clone the repository:

git clone https://github.com/openmedlab/PULSE
cd PULSE

Set up a conda environment and install dependencies:
```
conda env create -f llm.yml
conda activate llm
```

Usage Examples

PULSE can be used to perform various tasks, including:

Health education demonstrations
Medical examination question answering
Interpreting medical reports
Structuring medical records
Simulating consultations

It also effectively handles non-medical inquiries in a safe manner.

Related Projects

The PULSE project is connected with several initiatives, exploring various applications such as X-ray analysis with language models and COVID-19 specific adaptations. Other related projects enhance medical terminology normalization and provide libraries for healthcare chatbot development.

Acknowledgments

PULSE was developed with contributions from notable organizations, including the Shanghai Artificial Intelligence Laboratory and universities specializing in natural language processing and data mining.

License

The project's code is available under the Apache 2.0 license, and the model weights are shared under the GNU AGPL 3.0 license.

PULSE represents a leap forward in the use of AI models for healthcare, offering a diverse range of applications that promise to advance medical understanding and delivery.