#Natural Language Processing

Logo of spaCy
spaCy
Explore spaCy's robust NLP platform supporting over 70 languages using state-of-the-art neural networks. Access pretrained pipelines for essential tasks like tokenization, named entity recognition, and text classification. Leverage multi-task learning with BERT transformers, ensuring easy deployment and production-readiness. Enhance projects with custom models in frameworks like PyTorch or TensorFlow, and utilize powerful visualizers for syntax and NER. This open-source software, under the MIT license, offers high accuracy and extensibility for all your NLP needs.
Logo of stanza
stanza
Stanza is a Python NLP library that offers comprehensive support for processing over 60 languages, including named entity recognition and syntactic analysis. It integrates with Java Stanford CoreNLP for efficient text processing and dependency graph manipulation. The library now includes specialized biomedical and clinical models for advanced text analysis. Stanza is easy to install using pip or Anaconda and provides interactive learning options through Google Colab. Users can also train custom models, thus enhancing the adaptability of NLP tasks.
Logo of courses
courses
This extensive repository of AI courses provides free educational materials for learners at all levels, including topics such as Generative AI, Natural Language Processing, and Deep Learning. Featuring resources from renowned institutions like MIT, Stanford, and Harvard, it serves as a valuable tool for anyone looking to deepen their understanding of artificial intelligence. Contributions are welcome to continually expand this growing collection.
Logo of WebGLM
WebGLM
WebGLM is a groundbreaking question-answering system that incorporates web search and retrieval functions powered by a 10-billion-parameter General Language Model. It features LLM-augmented retrieval for precise web content accuracy, a generator for coherent responses, and quality estimation aligned with human preferences, ensuring cost-effective and high-performing solutions for real-world deployments.
Logo of sql-translator
sql-translator
SQL Translator bridges SQL and natural language, supporting users in understanding and writing SQL. The tool's open-source design includes features like dark mode, syntax highlighting, and query history, functioning locally via Docker or npm. Contributions welcome on GitHub with future expansions planned.
Logo of nlp.js
nlp.js
NLP.js supports language detection, sentiment analysis, and named entity recognition in 40 languages naturally and 104 with BERT integration. It introduces modular packages and a flexible plugin system in version 4, ideal for building multilingual chatbots. With connectors like Microsoft Bot Framework, NLP.js facilitates operations from tokenization to sentiment analysis. This versatile framework integrates smoothly into various systems, enhancing user interaction capabilities.
Logo of opennlp
opennlp
Apache OpenNLP is a machine learning toolkit for natural language processing, fully written in Java. It offers features such as tokenization, sentence segmentation, and entity extraction, making it useful for advanced text processing. The project provides tools and pre-built models for multiple languages and supports integration with systems like Apache Spark and Flink. Explore its documentation, use demo models, and join the active open-source community.
Logo of course
course
Explore the capabilities of Transformers in natural language processing with this detailed course. Gain hands-on experience with Hugging Face tools, including 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate. The course is free and open-source, available in multiple languages, and welcomes translation contributions. Ideal for expanding knowledge of deep learning applications beyond NLP in a collaborative environment.
Logo of azure-openai-samples
azure-openai-samples
Explore the resources available for understanding GPT basics and its applications with Azure's offerings. Learn to integrate GPT with services such as Synapse Analytics for NLP and Business Process Automation. Access practical samples including serverless SQL and OpenAI-powered semantic search. Stay informed about the latest advancements including GPT-4 and contribute to the ongoing development. This is ideal for developers and organizations looking to leverage AI in diverse sectors such as chatbots, customer service, and content creation.
Logo of nlp-paper
nlp-paper
This repository curates pivotal NLP papers, providing detailed reviews and implementation insights across topics like text similarity, dialogue systems, and deep learning. It includes thorough reading notes on classic and innovative research, aiding scholars in understanding NLP advancements. The platform enhances accessibility with search tools and code repositories, offering a vital resource updated regularly for NLP research enthusiasts.
Logo of How-to-use-Transformers
How-to-use-Transformers
Accompanying the 'Transformers Library Quick Start' tutorial, this code repository offers structured modules and datasets for practical NLP learning. Focused on core concepts of Transformer models, it covers applications like sentiment analysis, sequence labeling, and text summarization. The tutorial also delves into large language model integration, with ongoing updates to introduce the latest NLP advancements.
Logo of BERT-Relation-Extraction
BERT-Relation-Extraction
This open-source project implements relation extraction models using PyTorch, based on BERT and its variants, ALBERT, and BioBERT. Drawing from the 'Matching the Blanks' methodology, the project supports pre-training and fine-tuning for datasets such as CNN and SemEval2010 Task 8. Utilizing Spacy for entity recognition, it provides inference capabilities to predict relationships in annotated text. While unofficial, this project aligns with the referenced paper for effective relationship classification, with benchmark results from FewRel and SemEval2010 Task 8.
Logo of FlowTest
FlowTest
FlowTestAI provides a streamlined solution for developers and QA teams to enhance their API testing workflows. This tool's features include natural language processing, drag-and-drop interfaces, and support for major language models like OpenAI and AWS Bedrock. FlowTestAI integrates with OpenAPI specifications and offers cross-platform compatibility, functioning both as a desktop application and within CI/CD pipelines. Its advanced analytics provide deep insights into API performance.
Logo of awesome-nlp
awesome-nlp
Discover an extensive collection of tools, libraries, and resources for Natural Language Processing (NLP), covering research summaries, trends, leading research labs, tutorials, and programming libraries. This guide provides resources for multiple languages, services, and annotation tools, supporting the advancement of NLP projects. Suitable for new learners and experienced researchers, it offers insights into deep learning techniques, theories, and practical implementations. Stay informed with the latest progress and enhance understanding through books, courses, and informative blogs.
Logo of PyTorch-Tutorial-2nd
PyTorch-Tutorial-2nd
Discover extensive deep learning applications and inference deployment frameworks in this updated resource. This tutorial builds upon the first edition, offering foundational concepts and guiding from basic knowledge to industry applications in computer vision, NLP, and large language models. It details PyTorch fundamentals and projects covering image processing, text generation, and model deployment with ONNX and TensorRT, allowing learners to apply theory in practice. Designed for AI learners, students, and professionals aiming to extend their understanding and practical skills in PyTorch.
Logo of AiLearning-Theory-Applying
AiLearning-Theory-Applying
The project offers an in-depth look into AI concepts ranging from basic to advanced levels, covering areas like machine learning, deep learning, and BERT-based natural language processing. It includes extensive tutorials and datasets, making it suitable for learners at different stages. The curriculum spans key areas such as foundational mathematics, machine learning competitions, the basics of deep learning, and a user-friendly Transformer guide. The materials are regularly refreshed to reflect the latest in AI development, providing a clear and thorough understanding of AI models.
Logo of LLM-eval-survey
LLM-eval-survey
This resource provides an in-depth review of diverse evaluation methods for large language models (LLMs), covering aspects like natural language processing and reasoning abilities. It features academic papers and projects assessing the robustness, ethics, and trustworthiness of LLMs. Regular updates ensure the most recent insights, with an open invitation for contributions to further refine the survey.
Logo of Legal-Text-Analytics
Legal-Text-Analytics
This repository provides a comprehensive overview of resources, methods, and tools focused on Legal Text Analytics, including tasks like Optical Character Recognition and Legal Norm Classification. It features libraries such as Spacy and NLTK, as well as datasets for various legal applications. Suitable for professionals and researchers aiming to advance NLP capabilities in the legal field, it also covers the latest in Legal Tech using Large Language Models and GPT. Opportunities for community collaboration and contributions are available.
Logo of rust-bert
rust-bert
Rust-bert offers Rust-native NLP model implementations supporting translation, summarization, sentiment analysis, and more, utilizing Hugging Face's Transformers via tch-rs and onnxruntime bindings. Includes multi-threaded tokenization and GPU inference with easy-to-use pipelines.
Logo of mindnlp
mindnlp
MindNLP is an open-source library based on MindSpore, aiming to facilitate natural language processing using advanced models. It supports over 250 pretrained models with APIs comparable to Hugging Face, ensuring easy integration. MindNLP is proficient in data processing and offers a configurable model toolset for seamless customization. It allows deployment across platforms such as Ascend, Orange Pi, GPU, and CPU, with features like parallel inference and quantization. MindNLP enables various applications, including chatbots and speech recognition, bolstering the NLP research and development workflow.
Logo of hardware-aware-transformers
hardware-aware-transformers
Explore HAT's ability to leverage Hardware-Aware Transformers to boost natural language processing efficiency. The project offers PyTorch code and includes 50 pre-trained models that aid in locating optimized solutions for distinct hardware, cutting search costs by over 10000 times. HAT provides up to triple the speed and a 3.7-fold reduction in model size with no performance detriment. Featuring latency feedback for hardware like Raspberry Pi and Intel Xeon, HAT presents a cutting-edge method for optimizing machine translation tasks, delivering superior performance across various devices.
Logo of Agently
Agently
Agently offers a framework for quickly creating AI agent native applications. It enables direct integration of AI agents into code, simplifying automation and execution of business processes. Access tutorials for model switching, workflow development, and using AgenticRequest. Find code examples for tasks like SQL generation and feedback collection. Suitable for versatile application development with reduced code complexity. Explore plugin-enhancements for ongoing improvements without needing to rebuild agents. Engage with an intuitive platform focused on innovation.
Logo of spago
spago
Spago is a Go-based machine learning library that supports neural architectures in NLP with a lightweight computational graph. It ensures easy dynamic execution and includes feed-forward, recurrent, and attention layers, plus numerous gradient descent optimizers. By eliminating Python dependencies, it produces standalone executables for production use. Although currently on hold, Spago remains a valuable resource with Gob compatibility and the Cybertron package for NLP applications.
Logo of sentiment-analysis
sentiment-analysis
Understand a range of methods used in Chinese sentiment analysis, including techniques based on sentiment dictionaries, traditional machine learning such as Bayes, and advanced deep learning with models like ALBERT. The project explores both unsupervised and supervised approaches for text data sentiment classification, emphasizing the integration of unknown tokens like emojis to improve sentiment semantic analysis. This overview presents distinctive attributes and practical implementations of each method.
Logo of NLP_Quickbook
NLP_Quickbook
This guide provides engineers with practical insights into natural language processing using Python, grounded in both classic and contemporary academic works. It offers code-first methods for tasks such as text classification, cleaning, and spell correction across seven thematic areas. Tools for named entity recognition, question and answer generation, and vector representations using word2vec and gensim are included. Discover text classification and ensemble techniques, and delve into deep learning methodologies with PyTorch, alongside a quick-start guide for chatbot creation.
Logo of ltp
ltp
The Language Technology Platform (LTP) offers a range of tools for Chinese text processing such as word segmentation, part-of-speech tagging, and syntactic parsing. Utilizing a multi-task framework with a shared pre-trained model, LTP boosts efficiency by capturing collective insights across tasks. The 4.2.0 update brings enhanced model structures, improved performance with Rust implementations, and supports Huggingface model uploads. Suitable for researchers and developers aiming for advanced Chinese NLP functions.
Logo of awesome-artificial-intelligence
awesome-artificial-intelligence
Explore a rich collection of AI resources spanning tools, courses, and literature, aimed at providing professionals and enthusiasts with a deep dive into key AI fields like machine learning and natural language processing. The guide offers segmentations into chat, image, and video creation tools, supported by educational material from renowned institutions to facilitate comprehensive understanding and application of AI innovations. Discover ongoing insights through curated journals and blogs, further enriched by free access to a variety of content.
Logo of PaddleHub
PaddleHub
Access a wide range of AI models for computer vision, NLP, speech, and cross-modal tasks. Models are deployable with just three lines of code, compatible with Linux, Windows, and MacOS. Newest features include ERNIE-ViLG, Disco Diffusion, and Stable Diffusion. Utilize models as a service and explore resources on Hugging Face Space through interactive demos with available pre-trained open-source models.
Logo of spark-nlp
spark-nlp
Utilize an efficient NLP library offering scalable annotations across 200+ languages, suitable for tasks such as tokenization and language translation. It integrates state-of-the-art transformers like BERT and GPT-2 and supports Python, R, and JVM platforms. This library facilitates model imports from frameworks including TensorFlow and ONNX, enhancing compatibility in distributed machine learning systems.
Logo of huozi
huozi
Huozi offers notable advancements in language processing with its sparse mixture of experts (SMoE) architecture, enabling efficient handling of extended contexts. Designed for use in both academic and industrial settings, it features enhancements such as multilingual knowledge integration and refined reasoning capabilities. The model's release comes with various checkpoints and broad platform support, allowing comprehensive deployment and performance acceleration across systems like Transformers and ModelScope.
Logo of OpenGPT
OpenGPT
OpenGPT 3.5/4 features intuitive APIs that simplify NLP integration across diverse applications, offering an accessible path for developers to implement natural language processing. This open-source project enhances application functionality and provides free access to AI models. Community collaboration and contributions are encouraged to innovate AI integration, thereby supporting seamless integration with larger systems.
Logo of php-text-analysis
php-text-analysis
PHP Text Analysis is a reliable library offering Information Retrieval and NLP tools specifically for PHP. It includes features such as document classification, sentiment analysis, and frequency analysis. Additionally, it offers support for tokenization, stemming, n-gram generation, and keyword extraction with the Rake algorithm. Customization options for tokenizers and stemmers allow developers to adapt the library to their needs. The accompanying documentation provides useful guidance for implementation, aiding developers in adding robust text analysis capabilities to their PHP projects.
Logo of pytextrank
pytextrank
Discover PyTextRank, a Python library enhancing spaCy with graph-based algorithms like TextRank for tasks such as phrase extraction and summarization. It efficiently converts unstructured data into structured insights. Installation is easy via PyPi or Conda, and the integration with spaCy models is seamless. Access tutorials and documentation for effective application in research or software development.
Logo of llm-hub
llm-hub
This repository features a collection of advanced language model applications, including GitHub repositories and tutorials utilizing models like GPT-3. It serves as a resource for those interested in developing and understanding AI applications, with examples from text generation to question answering, and tutorials that support creating personalized AI solutions. The repository also provides additional learning materials for comprehensive exploration of AI capabilities.
Logo of awesome-bangla
awesome-bangla
This collection offers diverse tools, datasets, and resources for Bangla computing, catering to those working on Natural Language Processing (NLP) in Bangla (Bengali). It features various typing tools, phonetic parsing libraries, language-processing datasets, and NLP resources such as POS taggers and sentiment analysis. Additionally, it includes machine translation, OCR, and TTS resources, advancing capabilities in the field. These resources encourage innovation and community participation within the Bangla language technology sphere.
Logo of cltk
cltk
The Classical Language Toolkit (CLTK) is a Python library designed for NLP tasks in pre-modern languages. It features a modular architecture that supports diverse algorithms with pre-configured defaults across 20 languages. By leveraging techniques from established NLP frameworks, CLTK addresses specific needs often overlooked by modern language tools. The library is easy to install and comes with comprehensive documentation, making it a valuable asset for researchers in digital humanities focused on ancient texts.
Logo of HanLP
HanLP
HanLP is a versatile, open-source multilingual natural language processing toolkit powered by PyTorch and TensorFlow 2.x. It is built for production-grade environments and supports a wide array of languages and tasks, including tokenization, part-of-speech tagging, named entity recognition, and dependency parsing. With both RESTful and native APIs, HanLP guarantees semantic consistency and is optimized for high accuracy and efficiency, backed by continuous updates from extensive multilingual corpora.
Logo of chatGPT-cheatsheet
chatGPT-cheatsheet
This guide provides a foundational understanding of ChatGPT and its AI applications, including detailed instructions for using its prompt portal and API. It includes practical examples on prompt engineering, error management, and application development. The guide clearly explains AI concepts such as machine learning and NLP, making it suitable for beginners and seasoned developers alike. It also highlights the rationale for avoiding the sharing of sensitive information with ChatGPT, and offers insights into optimizing its use for generating human-like text.
Logo of learning
learning
Explore insights into developing essential software engineering skills with an emphasis on Python and generative AI. Updated monthly, this project explores key competencies in areas like data structures, algorithms, Linux, version control, database management, backend development, system design, frontend basics, and specialized fields such as machine learning and NLP. Designed for individuals aiming to enhance expertise in adjacent technologies methodically, from Python data tools to advanced AI techniques.
Logo of awesome-project-ideas
awesome-project-ideas
Explore a curated selection of over 30 deep learning and machine learning project ideas suitable for academic and industry contexts. These projects cover skill levels from beginner to advanced research, featuring domains like natural language processing, time series forecasting, and recommendation systems. Discover innovative approaches in image and video processing, music and audio analysis. Engage in hackathon opportunities and explore advanced topics such as semantic search and knowledge base QA. A valuable resource for students, researchers, and developers seeking to broaden their understanding of AI and machine learning.
Logo of From-0-to-Research-Scientist-resources-guide
From-0-to-Research-Scientist-resources-guide
Designed for individuals with a computer science background or basic programming skills, this guide provides resources for transitioning to research roles in Deep Learning and NLP. It covers crucial mathematical foundations like Linear Algebra and Probability and key areas including Machine Learning, Deep Learning, and Reinforcement Learning. It offers flexibility by supporting both bottom-up and top-down learning approaches.
Logo of ML-YouTube-Courses
ML-YouTube-Courses
This repository features a curated selection of machine learning courses from YouTube, spanning topics like basics, deep learning, NLP, computer vision, and reinforcement learning. Compiled by DAIR.AI, it includes courses from prestigious institutions such as Caltech, Stanford, and MIT, offering educational resources for professionals and enthusiasts. It provides access to advanced courses on modern techniques and practical applications, serving both beginners and experienced learners in AI and machine learning.
Logo of tock
tock
Tock is an open conversational AI platform for building bots with natural language processing capabilities across tools like OpenNLP, Stanford, and Rasa. It includes Tock Studio for creating stories and analytics, along with a conversational DSL for Kotlin, Node.js, Python, and REST API. The platform integrates with channels like Messenger, WhatsApp, Google Assistant, Alexa, and Twitter, and offers React and Flutter toolkits for custom web/mobile apps. Deployable via cloud or on-premises using Docker, Tock provides comprehensive documentation and a live demo.
Logo of wink-nlp
wink-nlp
Explore detailed NLP enhancements with WinkNLP, a JavaScript library focused on efficient and rapid development. Utilize word embeddings for advanced text analysis and enjoy a lightweight, dependency-free framework. WinkNLP supports multilingual tokenization, sentiment analysis, and named entity recognition across Node.js and browser environments, achieving 650,000 tokens per second. It includes robust features such as negation handling and POS tagging, complemented by comprehensive test coverage and adherence to security best practices, making it ideal for developing reliable NLP solutions.
Logo of nltk
nltk
NLTK provides an extensive collection of open-source Python modules, datasets, and tutorials tailored for natural language processing research and development. Supporting Python versions 3.8 to 3.12, it serves as a crucial tool for both academic and practical purposes. Comprehensive documentation is available, and contributions to continuous development are supported. Donations via PayPal aid future advancement. Works citing NLTK should reference its NLP book. Redistribution is allowed under license terms, ensuring accessibility for educational and innovative endeavors.
Logo of hazm
hazm
Hazm is a fundamental tool for Persian text processing, offering features such as normalization, tokenization, and lemmatization. It includes robust tools for POS tagging, dependency parsing, and effective word embedding. The toolkit supports linguists and developers with comprehensive pre-trained models and easy access to documentation, providing a scalable solution for both research and practical applications in Persian NLP.
Logo of WordGPT
WordGPT
WordGPT is a Microsoft Office add-in leveraging OpenAI's text-davinci-003 model to enhance writing capabilities in Microsoft Word. It provides advanced natural language processing to seamlessly generate text. The add-in supports straightforward sideloading on Windows and MacOS. As an open-source project under the MIT License, it encourages user contributions to foster innovation and community growth.
Logo of ML-ProjectKart
ML-ProjectKart
Explore a diverse collection of open-source machine learning projects, crafted to enhance expertise in ML, deep learning, computer vision, and natural language processing. This repository includes projects suitable for beginners and advanced users, ideal for mastering algorithms and model construction. Engage with a vibrant community, follow detailed contribution guidelines, and collaborate on projects such as advertisement prediction, air quality indexing, and brain tumor detection. ML-ProjectKart serves as a valuable resource for advancing in the ML/AI field.
Logo of clause
clause
Clause offers an open-source solution for semantic understanding in chatbot development using deep learning, NLP, and search engine technologies. It allows for the management of multiple bots, custom intent creation, and supports integration with various programming language interfaces, making it ideal for customer service and intelligent QA systems.
Logo of natasha
natasha
Natasha provides solutions for Russian NLP tasks like tokenization, sentence segmentation, and NER, focusing on high performance and compact models. It integrates libraries such as Razdel, Navec, Slovnet, and Yargy under a unified API, optimized for news article processing, ensuring accuracy and efficiency.