llm-graph-builder - Leverage Large Language Models to Create Structured Knowledge Graphs from Unstructured Data

LLM Graph Builder Project

Overview

The LLM Graph Builder application is a powerful tool designed to convert unstructured data such as PDFs, documents, text files, YouTube videos, and web pages, into an organized, structured knowledge graph. This graph is stored in a Neo4j database. By harnessing the capabilities of large language models (LLMs) from platforms like OpenAI or Gemini, the app extracts nodes, relationships, and properties from text, creating a coherent knowledge graph through the Langchain framework. Users can upload files from various sources, choose their preferred LLM model, and generate a knowledge graph.

Key Features

Knowledge Graph Creation: Transform unstructured data into structured knowledge graphs utilizing LLMs.
Customizable Schema: Users can input their own custom schemas or utilize existing ones in settings to generate graphs.
Visualization: Visualize graphs from specific sources or a compilation of sources at once in the Bloom visual tool.
Interactive Queries: Engage in conversational queries with the data in a Neo4j database, and retrieve metadata about response sources.

Getting Started

To use the Knowledge Graph Builder, a Neo4j Database version 5.15 or later with APOC installed is required. The app is compatible with Neo4j Aura databases. Users of Neo4j Desktop should follow separate instructions for backend and frontend deployment.

Deployment Options

Local Deployment

Docker-Compose: Primarily supports OpenAI and Diffbot. Users can customize the LLM model environment using the VITE_LLM_MODELS_PROD variable. Set the API keys for OpenAI and Diffbot in a .env file, and launch using docker-compose up --build.
Input Sources Configuration: By default, it supports local files, YouTube, Wikipedia, AWS S3, and web pages. Google GCS integration can be added with a client ID adjustment.
Chat Modes: Several modes like vector, graph, and entity vector are available. Users set these through the VITE_CHAT_MODES variable.

Backend and Frontend Deployment

Frontend: After creating a .env file from frontend/example.env, launch with Yarn.
Backend: Mirror the frontend setup, create a virtual environment, install requirements, and deploy using Uvicorn.

Cloud Deployment

Deploy the app on Google Cloud Platform using Google Cloud Run commands for both frontend and backend, configuring environmental variables accordingly.

Environmental Variables

A range of environment variables allow deeper customization of the app's settings, such as embedding models, chat modes, database credentials, and cloud logging settings.

Usage

Database Connection: Users connect to a Neo4j Aura Instance using credentials.
Data Source Selection: Choose from various unstructured data sources for graph creation.
LLM Selection: Optionally change the LLM for graph generation via a dropdown menu.
Schema Definition: Define nodes and relationship labels within entity graph extraction settings if necessary.
Graph Generation: Generate graphs for selected files or all files in a 'New' status.
Graph Preview: View graphs for individual files or multiple files via the 'Preview Graph' function.
Interactive Querying: Query the processed data and obtain detailed information about LLM-generated answers.

Useful Links

Support

For assistance or inquiries, users are encouraged to raise issues on the Github Repo.

Conclusion

With the LLM Graph Builder, creating knowledge graphs from diverse unstructured data sources becomes a streamlined and dynamic process. Whether it’s through local deployment or cloud usage, this tool supports a broad range of configurations to meet user needs in data organization and interaction. Enjoy building comprehensive and insightful graphs!