Introduction to QiZhenGPT Project
QiZhenGPT: An Overview
QiZhenGPT is an open-source Chinese Medical Large Language Model developed with a focus on enhancing the accuracy of medical question-and-answer tasks in Chinese. Using the QiZhen Medical Knowledge Base, the project assembled a comprehensive dataset of medical instructions in Chinese. These have been fine-tuned on advanced language models like Chinese-LLaMA-Plus-7B, CaMA-13B, and ChatGLM-6B to significantly improve performance in medical contexts.
Initially, the project has focused on drug-related knowledge, launching a dataset for evaluation purposes, with plans to expand improvements across disease, surgery, and laboratory diagnostics. There are ongoing endeavors to optimize applications in doctor-patient interactions and automatic medical report generation.
MedCopilot: A Smart Medical Assistant
MedCopilot, leveraging the QiZhen Medical Large Model and associated databases, is an innovative tool aimed at enhancing patient care, assisting doctors, and streamlining hospital management. It integrates top-tier artificial intelligence technology, comprehensive medical knowledge, and clinical data, making it a potentially transformative resource in the healthcare industry. Notably, MedCopilot has already been implemented at Zhejiang University's Second Affiliated Hospital.
Updates and Releases
- August 9, 2024: Updated information about MedCopilot.
- June 27, 2023: Released an open-source trial version of the QiZhen Medical Model, focusing on accurate disease and drug knowledge Q&A.
- June 9, 2023: Released another trial version to improve drug knowledge Q&A accuracy.
- June 2, 2023: Released a trial version aimed at enhancing drug knowledge Q&A.
- May 30, 2023: Released a 20k training dataset sourced from real-world doctor-patient interactions and drug-related text from the QiZhen Medical Knowledge Base.
- May 25, 2023: Opened a dataset for drug-related indications.
- May 24, 2023: Released a trial version aimed at improving drug knowledge Q&A accuracy.
- May 23, 2023: Released an earlier version focusing on drug knowledge.
Features of MedCopilot
-
Task List Assistant: Merges with hospital systems to provide doctors with summaries of important daily tasks like patient admissions, surgeries, consultations, documentation, and key patient statuses.
-
Diagnostic Assistance: Offers customized diagnostic and treatment suggestions by combining the medical knowledge base with patient data to help doctors make informed decisions.
-
Quality Assurance: Monitors medical processes in real-time based on national quality standards to identify and address potential issues, thereby improving healthcare delivery.
-
Medical Documentation: Automatically generates standardized medical documents by analyzing patient data, reducing repetitive tasks for doctors and enhancing efficiency.
-
Additional Functions: Includes a research assistant for paper interpretation and a health assistant for report analysis and chronic disease management.
QiZhenGPT Details
Instruction Dataset Construction
QiZhenGPT distinguishes itself by using carefully curated data to construct its instruction datasets:
- The model includes 560k entries from real doctor-patient interactions covering diseases, drugs, tests, surgeries, and more.
- For drug-based knowledge, it contains 180k entries, asking questions like drug indications.
- Disease data includes 298k entries, posing questions about typical symptoms.
Training Details
Models are fine-tuned over several stages:
- QiZhen-Chinese-LLaMA-7B: Utilized at multiple stages with models trained on multiple A800 GPUs, highlighting significant advancements in every checkpoint.
- QiZhen-ChatGLM-6B: Followed a similar approach with checkpoints to enhance drug knowledge.
- QiZhen-CaMA-13B: Trained on larger datasets, showcasing improvement through incremental stage checkpoints.
Model Downloads and Quick Start
QiZhenGPT offers various versions for different training models, available for download. Each version comes with detailed installation instructions and scripts for easy setup and demo execution, making it straightforward for users to integrate and utilize the models in their applications.
Future Prospects
The project's commitment to a "data + knowledge" strategy integrates large model technology with medical knowledge bases. This approach facilitates ongoing research in medical data management and clinical decision support, aligning with the practical needs of the healthcare field and paving the way for impactful applications of large model technology in real-world medical scenarios.