spacy-llm - Enhance NLP Capabilities by Integrating Large Language Models with spaCy

spacy-llm: Revolutionizing NLP with Large Language Models

The spacy-llm package is an innovative tool that brings the power of Large Language Models (LLMs) directly into the renowned spaCy library. This integration provides a modular system for rapid prototyping and structuring unstructured responses into well-defined outputs for various Natural Language Processing (NLP) tasks without requiring training data.

Key Features

LLM Component Integration: spacy-llm offers a serializable llm component that seamlessly integrates prompts into a spaCy pipeline, enabling efficient language model utilization.
Modular Task and Model Functions: Users can define tasks, such as prompting and parsing, alongside models, offering flexibility and customization.
API Interfaces: Direct interaction with APIs from major platforms like OpenAI, Cohere, Anthropic, Google PaLM, and Microsoft Azure AI.
Open-source LLM Support: Incorporates models from Hugging Face, including Falcon, Dolly, Llama 2, OpenLLaMA, StableLM, and Mistral.
LangChain Integration: Supports features from all langchain models, expanding its capabilities.
Prebuilt Tasks: Comes equipped with ready-to-use tasks like Named Entity Recognition, Text Classification, Lemmatization, Relationship Extraction, and more.
Custom Function Implementation: Through spaCy's registry, users can easily create custom prompting, parsing, and model integrations.
Map-Reduce Approach: Efficiently handles prompts that exceed LLM context windows by splitting and merging them.

Motivation Behind spacy-llm

LLMs possess exceptional natural language understanding abilities. They can execute tasks like text categorization, entity recognition, and information extraction with minimal to no examples.

Traditionally, spaCy excels with supervised learning or rule-based operations. While LLMs can expedite prototyping, supervised learning remains superior for production due to accuracy, efficiency, reliability, and control. However, the flexibility of spacy-llm allows users to combine the strengths of both LLM-generated prompts and traditional approaches, adapting strategies as the project evolves.

Quick Installation and Setup

Though spacy-llm will be included in future spaCy versions, users can install it now within their current environment using:

python -m pip install spacy-llm

Note: As an experimental package, interface modifications may occur in minor updates.

Quickstart Guide

For a glance at its capabilities, users can try out text classification with an OpenAI GPT model. Here's a simple example in Python:

import spacy

nlp = spacy.blank("en")
llm = nlp.add_pipe("llm_textcat")
llm.add_label("INSULT")
llm.add_label("COMPLIMENT")
doc = nlp("You look gorgeous!")
print(doc.cats)
# {"COMPLIMENT": 1.0, "INSULT": 0.0}

To manage pipeline parameters, users can create a config.cfg file to maintain control over tasks and models.

Future Developments

The team behind spacy-llm is committed to expanding its functionality by adding more example tasks and supporting a wider range of models. They are open to contributions from the community through pull requests.

Getting Support

For questions or feedback, users are encouraged to engage with the community via the discussion board or report issues on the spaCy issue tracker.

In summary, spacy-llm bridges the capabilities of LLMs and spaCy, enabling users to leverage advanced language models in structured NLP workflows effectively.