ontogpt - Extracting Information from Text Using Ontology and Language Models

OntoGPT: An Introduction

OntoGPT is a powerful Python package designed to extract structured information from text using large language models (LLMs). By leveraging instruction prompts and ontology-based grounding, OntoGPT provides a sophisticated approach to understanding and organizing textual data.

Getting Started with OntoGPT

OntoGPT primarily operates through the command line, although there is a simple web app available for those who prefer a graphical interface.

To get started with OntoGPT, follow these steps:

Python Installation: Ensure you have Python version 3.9 or higher installed on your system.
Package Installation: Use the pip command to install OntoGPT:
```
pip install ontogpt
```
API Key Setup: Set your OpenAI API key for connecting to language models:
```
runoak set-apikey -e openai <your openai api key>
```
Explore OntoGPT Commands: Check out the available commands by typing:
```
ontogpt --help
```
Example Usage: To see OntoGPT in action, run a basic extraction from a text file:
```
echo "One treatment for high blood pressure is carvedilol." > example.txt
ontogpt extract -i example.txt -t drug
```
This will process the text and retrieve relevant ontology-based information, displaying the output on the command line.

Using the Web Application

For those who prefer using a web interface, OntoGPT offers a basic web application. To run it, install the necessary dependencies:

pip install ontogpt[web]

Then start the application with:

web-ontogpt

Please note that public hosting without authentication is not recommended.

Integration with Model APIs

OntoGPT interacts with a range of APIs using the litellm package. This enables compatibility with platforms like OpenAI, Azure, Anthropic, Mistral, Replicate, and others. To utilize these services, model-specific API keys need to be configured.

For example, configuring a key for Anthropic requires:

runoak set-apikey -e anthropic-key <your anthropic api key>

Additional settings for Azure services can be specified similarly, either directly via commands or as environment variables.

Open Models

OntoGPT also supports the use of open language models through the ollama package. This requires installation and setup of the ollama software, after which models can be retrieved and used within OntoGPT.

Performance Evaluations

OntoGPT's capabilities have been rigorously evaluated on test data to ensure accuracy and reliability. Full documentation is available for those interested in understanding these evaluation methodologies and results.

Related Tools

OntoGPT is integrated with the TALISMAN project, which focuses on creating summaries of gene set functions. TALISMAN uses OntoGPT to interact with large language models effectively.

Educational Resources

OntoGPT has been featured in various presentations, providing deeper insights into its applications. Notable talks include:

"Staying grounded: assembling structured biological knowledge with help from large language models" by Harry Caufield
"Transforming unstructured biomedical texts with large language models" presented at ISMB/ECCB 2023
"OntoGPT: A framework for working with ontologies and large language models" by Chris Mungall

Links to presentation slides and videos are available for those interested in learning more.

Citation

For academic referencing, OntoGPT's method, SPIRES, is detailed in a paper by Caufield et al., titled "Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning." Published in Bioinformatics in March 2024, it can be accessed via this DOI link.

Acknowledgements

OntoGPT is developed as part of the Monarch Initiative, with support from Bosch Research, contributing to its success and development.