#Chinese NLP
Awesome-Chinese-LLM
Explore a diverse collection of over 100 Chinese language models, applications, datasets, and tutorials. This project showcases notable models such as ChatGLM, LLaMA, and Qwen, offering resources for various technical requirements. It serves as a collaborative platform for sharing open-source models and applications, fostering a broad resource hub in the evolving field of Chinese language models from development to deployment and learning materials.
Chinese-LLaMA-Alpaca
This project offers access to enhanced Chinese LLaMA and Alpaca models, designed to improve language understanding and execution of instructional tasks. By integrating additional Chinese vocabulary, these models enhance semantic comprehension and decoding efficiency over the original LLaMA. Instruction tuning further refines Alpaca models' performance. The project includes pre-training tools and supports various platforms such as 🤗transformers and LlamaChat, facilitating seamless integration and deployment on personal systems. A range of models, including 7B, 13B, and 33B versions, are available to address diverse requirements.
uniem
Discover the forefront of Chinese text embedding with our open-source project on HuggingFace. Featuring uniem's integration with sentence-transformers, text2vec, and tools like SGPT, recent updates in version 0.3.0 enhance fine-tuning possibilities. Explore new benchmarks in text classification and retrieval with MTEB-zh and contribute to the evolving community under Apache-2.0 licensing.
BianQue
BianQue, a project from the South China University of Technology, presents a state-of-the-art Chinese healthcare model designed to improve medical dialogue systems with proactive health attributes. Leveraging comprehensive Chinese health data and targeted medical consultation optimization, BianQue supports advancements in chronic disease management and psychological care. With its BianQue 2.0 iteration, the model expands its capabilities for medication guidance and thorough health advice in multi-turn dialogue settings, fostering broader collaboration opportunities in digital health innovation.
nlp_chinese_corpus
This project offers a wide array of Chinese language corpora to aid advancements in natural language processing. It includes structured Wikipedia articles, varied news reports, and community question-and-answer datasets, addressing the limited access to large-scale Chinese text datasets by researchers and developers. The project focuses on creating extensive high-quality text records, enhancing pre-trained language models, and assisting in NLP tasks like word vector generation and question answering. Recent updates have added community Q&A and translation datasets, enriching the resources for building advanced Chinese NLP models.
ltp
The Language Technology Platform (LTP) offers a range of tools for Chinese text processing such as word segmentation, part-of-speech tagging, and syntactic parsing. Utilizing a multi-task framework with a shared pre-trained model, LTP boosts efficiency by capturing collective insights across tasks. The 4.2.0 update brings enhanced model structures, improved performance with Rust implementations, and supports Huggingface model uploads. Suitable for researchers and developers aiming for advanced Chinese NLP functions.
Llama-Chinese
Llama Chinese Community focuses on optimizing Llama models for Chinese applications, supported by an experienced NLP engineering team. The community continually improves the models' capabilities, facilitating global collaboration among developers. Providing resources, networking, and opportunities for technical sharing, it recently launched the Llama 3.1 models along with tools for testing and deployment. Participants can join online events and collaborative activities to advance in Chinese NLP.
awesome_Chinese_medical_NLP
This project compiles extensive Chinese medical NLP resources, such as terminologies, corpora, word vectors, pre-trained models, and knowledge graphs. It includes tools for named entity recognition, QA systems, and information extraction. Highlighting resources like the CBLUE dataset, the project supports the growth of Chinese medical NLP technology and community. It is an essential source for researchers and practitioners focusing on Chinese medical texts, offering comprehensive tools from basic terminologies to advanced language models.
JioNLP
JioNLP provides a versatile Python library designed for Chinese NLP tasks, focusing on ease of use and precise results. Key features include parsing tools for vehicle license plates and time semantics, as well as keyphrase extraction. It supports data augmentation and regex parsing, offering functions for text augmentation and data cleaning. Suitable for developers seeking to enhance Chinese NLP projects with efficient, easy-to-use tools. Easily integrate JioNLP with a simple 'pip install jionlp'.
awesome-pretrained-chinese-nlp-models
This repository offers a meticulously curated selection of Chinese pretrained language models, including multimodal and large language models. It serves as a valuable resource for NLP researchers and practitioners, providing a range of models from foundational to specialized dialogue and multimodal conversation models. Regular updates ensure access to the latest models. Key features include various LLMS models like BERT and GPT, delivering general and domain-specific functionalities, along with evaluation benchmarks, online model trials, and open datasets for comprehensive NLP efforts.
GPT2-Chinese
The GPT2-Chinese project provides a comprehensive toolkit for training Chinese language models using GPT2 technology. It includes support for BERT tokenizer and BPE models, enabling the generation of varied textual content such as poems and novels. The repository offers diverse pre-trained models, from ancient Chinese to lyrical styles, ideal for NLP practitioners. This resource supports large training corpora and encourages community collaboration through discussions and model contributions, aiding developers in advancing their NLP expertise in a practical and informative manner.
Pretrained-Language-Model
Huawei Noah's Ark Lab presents a variety of advanced Chinese language models and optimization techniques in this repository. Key components include PanGu-α with 200 billion parameters, NEZHA achieving peak performance in NLP tasks, and the compact TinyBERT model. Explore adaptive solutions like DynaBERT, BBPE for byte-level vocabulary, and memory-efficient tools such as CAME. Compatible with MindSpore, TensorFlow, and PyTorch, the repository serves a wide range of application needs.
Feedback Email: [email protected]