llmgraph - Generate Knowledge Maps from Wikipedia with Large Language Models

Introduction to llmgraph

The llmgraph project is an innovative tool designed to create knowledge graphs using the power of large language models (LLMs), such as ChatGPT. Knowledge graphs are visual representations of information that show relationships between different pieces of data, and llmgraph excels in generating these graphs in formats like GraphML, GEXF, and HTML. This functionality is particularly useful for those who wish to extract and represent complex connections based on data from Wikipedia pages.

Key Features of llmgraph

Knowledge Graphs Creation: Starting with a source entity from a Wikipedia page, llmgraph uses language models to extract relevant world knowledge, presenting it in a coherent and graphical form.
Multiple Formats Supported: Knowledge graphs can be generated in HTML, GraphML, and GEXF formats, making the tool versatile for different user needs.
Entity Types and Relationships: A wide range of entity types and their relationships are supported, thanks to customizable prompts that tailor the extraction to specific needs.
Efficient Processing with Caching: Caching mechanisms are in place to ensure efficient processing, allowing users to build upon previously generated knowledge graphs without repeating work.
Token Usage Transparency: The tool provides a tally of tokens used in the processing, offering users insight into the cost of operations when utilizing language model APIs.

Installation and Usage

Installation is straightforward using Python's pip package manager, and it is recommended to do so within a virtual environment. Example notebooks are available, especially one that can be run in Google Colab, demonstrating llmgraph's capabilities in a hands-on manner.

pip install llmgraph

For generating a knowledge graph, users need to specify an entity type and a Wikipedia source URL. For example, to explore connections around Artificial Intelligence, the following command is used:

llmgraph machine-learning "https://en.wikipedia.org/wiki/Artificial_intelligence" --levels 3

This command produces a three-level graph centered around the concept of Artificial Intelligence, showcasing related concepts dynamically extracted from the original source.

Outputs and Examples

Notably, llmgraph can output interactive HTML graphs that include dynamic visual elements. Various examples, such as 'Artificial Intelligence,' 'Inception,' and even historical figures like John von Neumann, illustrate the breadth and depth of possible applications.

Customization and Advanced Usage

The flexibility of llmgraph includes options to choose different LLMs beyond the default OpenAI gpt-4o-mini, including local models like Llama2. Users can modify prompts for specific entity types via configuration files to enhance or tailor the graph generation process.

Future Enhancements

Potential improvements in llmgraph include contrasting outputs from various language models, enhancing the quality and readability of graph outputs, and developing new entity prompts. The project aims to explore additional data sources and offer parallel processing for faster graph construction.

Contribution and Community

Contributions are welcomed, with guidelines provided for those interested in enhancing the tool. The project's collaborative approach ensures ongoing development and refinement, supported by voluntary contributions from the community.

In summary, llmgraph is a powerful and versatile tool for visualizing knowledge through graphs, leveraging cutting-edge language models. It is accessible, customizable, and ideal for users aiming to understand and represent complex data relationships clearly.