Instructor: Unlocking the Potential of Structured Outputs
Instructor is a renowned Python library that has become the favorite choice for managing structured outputs from large language models (LLMs). With a staggering 600,000 downloads each month, Instructor simplifies the processing and validation of data generated by LLMs, promising to enhance workflows efficiently and effectively. At its core, Instructor builds upon Pydantic, providing a straightforward API to handle response validation, retries, and streaming outputs. Here's a closer look at why this tool has won over the community.
Key Features of Instructor
- Response Models: Users can utilize Pydantic models to clearly define the structure of expected LLM outputs.
- Retry Management: It is simple to configure retry attempts for requests, ensuring robustness.
- Validation: With integrated Pydantic validation, outputs consistently meet designated criteria.
- Streaming Support: The library supports handling partial responses and lists efficiently.
- Flexible Backends: Instructor integrates smoothly with diverse LLM providers, not just OpenAI.
- Multi-language Support: It's available in various programming languages such as Python, TypeScript, Ruby, Go, and Elixir.
Quick Start Guide
Getting started with Instructor is easy. The installation process is swift, requiring just a quick command in the terminal:
pip install -U instructor
An example use case involves defining a UserInfo
model to extract structured data from natural language using the OpenAI client patched with Instructor.
import instructor
from pydantic import BaseModel
from openai import OpenAI
class UserInfo(BaseModel):
name: str
age: int
client = instructor.from_openai(OpenAI())
user_info = client.chat.completions.create(
model="gpt-4o-mini",
response_model=UserInfo,
messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)
print(user_info.name) # Output: John Doe
print(user_info.age) # Output: 30
Advanced Usage with Hooks
Instructor also allows users to enhance functionality with a hooks system. This system enables the logging of different stages of LLM interaction, important for debugging and monitoring:
import instructor
from openai import OpenAI
from pydantic import BaseModel
class UserInfo(BaseModel):
name: str
age: int
client = instructor.from_openai(OpenAI())
def log_kwargs(**kwargs):
print(f"Function called with kwargs: {kwargs}")
def log_exception(exception: Exception):
print(f"An exception occurred: {str(exception)}")
client.on("completion:kwargs", log_kwargs)
client.on("completion:error", log_exception)
user_info = client.chat.completions.create(
model="gpt-4o-mini",
response_model=UserInfo,
messages=[{"role": "user", "content": "Extract the user name: 'John is 20 years old'"}],
)
print(f"Name: {user_info.name}, Age: {user_info.age}")
Broad Compatibility with Models
Instructor seamlessly interacts with various LLM models, including Anthropic, Cohere, Gemini, and Litellm, through straightforward integration processes. Each model integration allows the user to extract structured data efficiently.
Smart Code Inference
Instructor intelligently infers types and supports asynchronous clients. Developers can easily handle the output with correctly inferred data types using methods such as create
, await create
, and create_with_completion
.
Contributing to Development
Instructor is a community-driven project that welcomes contributions. Potential contributors can explore issues labeled as good-first-issue
or help-wanted
to find an entry point.
Command-Line Interface (CLI)
Instructor comes equipped with a CLI that offers convenient functionalities such as managing jobs, files, and monitoring usage without having to visit the OpenAI website.
Conclusion
With its comprehensive features and growing popularity, Instructor stands out as a powerful tool for managing structured data from LLMs, suitable for developers seeking reliability and ease-of-use. Its robust framework and flexible integration with various models and languages position it as an essential asset in modern data processing workflows.