kor - Streamlined Data Extraction from Text Using LLMs

Introduction to Kor

Kor is an innovative tool designed to assist in extracting structured data from text through the use of Large Language Models (LLMs). This project, which operates as a wrapper around LLMs, provides unique abstractions and simplifies the process of extracting specific data points from unstructured text.

How Kor Works

The essence of Kor lies in defining a schema of what data needs to be extracted. Once the schema and relevant examples are provided, Kor generates a prompt and forwards it to the chosen LLM. The model then processes this prompt and attempts to extract data as per the defined schema. This makes Kor especially useful for generating insights and structured information from text documents.

Comparison with LangChain

Kor shares similarities with the LangChain framework, known for its use in data extraction via LLMs. However, Kor is distinct in its primary focus on the parsing approach, which is compatible with all sufficiently robust LLMs. Unlike other methods that may rely on function calls or JSON modes, Kor's effectiveness is largely driven by well-documented schemas and high-quality reference examples.

Release and Compatibility

With the rollout of version 1.0.0, Kor is now compatible with both pydantic v1 and v2. Users should be aware that pydantic v2 introduces stricter validation criteria, potentially catching errors not previously detected with version 1. However, serialization capabilities are still under development for the latest pydantic version. The Kor library is continuously tested for compatibility with several Python versions, including 3.8 to 3.11.

Practical Applications

Kor has several practical applications:

Data Extraction: It can extract structured data from text that aligns with a predefined schema.
AI Assistance: By comprehending user requests, it enhances AI assistant capabilities.
API Interaction: Facilitates natural language interfaces with existing application programming interfaces (APIs).

Limitations and Challenges

While Kor is a powerful tool, it is not without its challenges. The library is known for being somewhat error-prone and operates more slowly, particularly when dealing with large prompts and extensive examples. Additionally, for long text inputs, the context window limitations of LLMs can pose challenges.

Installation and Contribution

Installing Kor is straightforward and can be done via pip:

pip install kor

The project is open for contributions and ideas. Users are encouraged to report issues or propose new features through the project's issue tracker.

Summary

Kor offers a promising approach to structured data extraction using LLMs, with its main strength lying in schema-driven parsing. Although still a prototype with some growing pains, Kor invites enthusiastic contributions and exploration, making it an exciting tool for developers seeking to harness the capabilities of LLMs in information extraction.