#Named Entity Recognition
flair
Flair is a state-of-the-art natural language processing library offering tools for tasks like named entity recognition, sentiment analysis, and part-of-speech tagging. It is developed by Humboldt University of Berlin and supports a wide range of languages, with a particular focus on biomedical text processing. The library simplifies the use and combination of different embeddings with user-friendly interfaces and is built on PyTorch, which allows easy training of custom models. Comprehensive tutorials enable users to efficiently explore and deploy high-performance NLP models, with accessibility via platforms such as Hugging Face.
awesome_Chinese_medical_NLP
This project compiles extensive Chinese medical NLP resources, such as terminologies, corpora, word vectors, pre-trained models, and knowledge graphs. It includes tools for named entity recognition, QA systems, and information extraction. Highlighting resources like the CBLUE dataset, the project supports the growth of Chinese medical NLP technology and community. It is an essential source for researchers and practitioners focusing on Chinese medical texts, offering comprehensive tools from basic terminologies to advanced language models.
Awesome-LLM4IE-Papers
Explore a comprehensive range of academic papers on generative information extraction using Large Language Models (LLMs). This curated collection includes recent studies on topics such as named entity recognition, relation extraction, and event extraction. Access innovative methodologies like supervised fine-tuning, few-shot, and zero-shot learning, along with data augmentation and constrained decoding. The repository invites contributions from academics and offers a detailed survey of LLMs in generative information extraction. Keep current with the latest papers and access useful datasets to advance research in the information extraction domain.
SpanMarkerNER
SpanMarker provides a robust framework for Named Entity Recognition, using encoders such as BERT, RoBERTa, and ELECTRA. It integrates with the Hugging Face Transformers library, offering features like model management, hyperparameter tuning, and mixed precision training. SpanMarker enhances usability by supporting different annotation schemes and enables seamless access to the Hugging Face Hub, including a free API for fast deployment. It is suitable for developers aiming to train or utilize high-performance NER models on datasets like FewNERD and OntoNotes5.
prodigy-recipes
The repository contains versatile and customizable recipes for Prodigy, enabling scriptable annotations for text and images. A Prodigy license is required to access these scripts, which offer improvements and simplifications over the built-in recipes for easier understanding. It includes installation guidance, usage instructions, and script customization tips for tasks such as Named Entity Recognition and Image Annotation, along with community and tutorial recipes to enhance annotation workflows.
NLP-Interview-Notes
This resource offers carefully curated study notes and materials for natural language processing (NLP) interview preparation. It covers a broad array of interview questions across various NLP domains and provides thorough insights into algorithms such as Hidden Markov Model (HMM), Maximum Entropy Markov Model (MEMM), and Conditional Random Fields (CRF). Designed to support both novices and experienced professionals, the project addresses crucial topics like named entity recognition, relationship extraction, event extraction, and pre-training methods like TF-IDF and Word2Vec. Each section presents typical interview questions, explanations, and solutions, forming a comprehensive reference for NLP enthusiasts preparing for technical interviews.
GLiNER
GLiNER offers a lightweight solution for identifying diverse entity types with a BERT-like transformer encoder. It stands as a viable option against traditional NER models limited to predefined entities and large language models, often too resource-intensive. GLiNER balances flexibility and efficiency, applicable across various scenarios. Easy installation and pretrained models facilitate entity prediction. Access example notebooks for finetuning and model conversion, ensuring seamless integration in research and industry contexts.
entity-recognition-datasets
The repository offers a variety of annotated datasets for entity recognition and NER, covering domains such as news, medical, and finance. While updates stopped in 2020, it remains a valuable source for English-language datasets and supports format conversion to CoNLL 2003. Additionally, it connects to global datasets, providing a resource for multilingual NER study. Contributions through issues or pull requests are accepted to enrich this repository.
Few-NERD
Discover Few-NERD, a detailed dataset for named entity recognition featuring 8 broad categories and 66 detailed entity types. This valuable resource supports supervised and few-shot learning with three benchmark tasks, encompassing 188,200 sentences and around 500,000 entities. Easy BERT integration facilitates advanced training, and regular updates ensure relevance for researchers addressing complex natural language processing problems.
Feedback Email: [email protected]