OntoGPT: An Introduction
OntoGPT is a powerful Python package designed to extract structured information from text using large language models (LLMs). By leveraging instruction prompts and ontology-based grounding, OntoGPT provides a sophisticated approach to understanding and organizing textual data.
Getting Started with OntoGPT
OntoGPT primarily operates through the command line, although there is a simple web app available for those who prefer a graphical interface.
To get started with OntoGPT, follow these steps:
-
Python Installation: Ensure you have Python version 3.9 or higher installed on your system.
-
Package Installation: Use the
pip
command to install OntoGPT:pip install ontogpt
-
API Key Setup: Set your OpenAI API key for connecting to language models:
runoak set-apikey -e openai <your openai api key>
-
Explore OntoGPT Commands: Check out the available commands by typing:
ontogpt --help
-
Example Usage: To see OntoGPT in action, run a basic extraction from a text file:
echo "One treatment for high blood pressure is carvedilol." > example.txt ontogpt extract -i example.txt -t drug
This will process the text and retrieve relevant ontology-based information, displaying the output on the command line.
Using the Web Application
For those who prefer using a web interface, OntoGPT offers a basic web application. To run it, install the necessary dependencies:
pip install ontogpt[web]
Then start the application with:
web-ontogpt
Please note that public hosting without authentication is not recommended.
Integration with Model APIs
OntoGPT interacts with a range of APIs using the litellm
package. This enables compatibility with platforms like OpenAI, Azure, Anthropic, Mistral, Replicate, and others. To utilize these services, model-specific API keys need to be configured.
For example, configuring a key for Anthropic requires:
runoak set-apikey -e anthropic-key <your anthropic api key>
Additional settings for Azure services can be specified similarly, either directly via commands or as environment variables.
Open Models
OntoGPT also supports the use of open language models through the ollama
package. This requires installation and setup of the ollama
software, after which models can be retrieved and used within OntoGPT.
Performance Evaluations
OntoGPT's capabilities have been rigorously evaluated on test data to ensure accuracy and reliability. Full documentation is available for those interested in understanding these evaluation methodologies and results.
Related Tools
OntoGPT is integrated with the TALISMAN project, which focuses on creating summaries of gene set functions. TALISMAN uses OntoGPT to interact with large language models effectively.
Educational Resources
OntoGPT has been featured in various presentations, providing deeper insights into its applications. Notable talks include:
- "Staying grounded: assembling structured biological knowledge with help from large language models" by Harry Caufield
- "Transforming unstructured biomedical texts with large language models" presented at ISMB/ECCB 2023
- "OntoGPT: A framework for working with ontologies and large language models" by Chris Mungall
Links to presentation slides and videos are available for those interested in learning more.
Citation
For academic referencing, OntoGPT's method, SPIRES, is detailed in a paper by Caufield et al., titled "Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning." Published in Bioinformatics in March 2024, it can be accessed via this DOI link.
Acknowledgements
OntoGPT is developed as part of the Monarch Initiative, with support from Bosch Research, contributing to its success and development.