knowledge-gpt - Streamlined Data Collection and Contextual Prompt Crafting Using OpenAI Models

Overview of knowledgegpt

Knowledgegpt is an innovative project designed to harness the power of vast information sources like the internet and local data to create prompts. These prompts are integral in leveraging OpenAI’s GPT-3 model to generate informative answers. These answers are then stored in a database, offering a reliable resource for future queries.

How It Works

Knowledgegpt begins by transforming any given text into a fixed-size vector. This is achieved using either open-source or OpenAI models. When a user submits a query, this query is converted into a vector format and compared against stored knowledge embeddings. The most relevant information is then selected and used to create a context for generating the response.

Sources of Information

The strength of knowledgegpt lies in its capability to support a wide array of information sources:

Websites: It utilizes the vast content available on the internet.
PDFs, PowerPoint Files (PPTX), and Word Documents (Docs): Local data can be extracted from these formats to produce information-rich responses.
YouTube Subtitles and Audio: By employing speech-to-text technology, it captures text from audio resources, further expanding its data reach.

Getting Started

Installation

To begin using knowledgegpt, users can install it using PyPI or get the latest version from the repository. It’s also essential to download the necessary language model for parsing English text.

# PyPI installation
pip install knowledgegpt

# Latest version from the repository
pip install -r requirements.txt && pip install .

# Language model download
python3 -m spacy download en_core_web_sm

Using knowledgegpt

Restful API and API Key Configuration

To run a Restful API, simply execute uvicorn server:app --reload. Users must also set up their OpenAI API Key by generating a secret key from their OpenAI account and entering it into the configuration file.

Examples of Usage

Knowledgegpt provides various extractors for different file types:

WebScrapeExtractor for extracting data from web content.
PDFExtractor focuses on PDFs.
PowerpointExtractor and DocsExtractor for PPTX and DOCX files respectively.
YoutubeAudioExtractor and YTSubsExtractor for extracting audio and subtitle data from YouTube.

Here’s a simple example using the library to get started:

from knowledgegpt.extractors.web_scrape_extractor import WebScrapeExtractor
import openai
from example_config import SECRET_KEY 

openai.api_key = SECRET_KEY
url = "https://en.wikipedia.org/wiki/Bombard_(weapon)"
scrape_website = WebScrapeExtractor(url=url, embedding_extractor="hf", model_lang="en")

answer, prompt, messages = scrape_website.extract(query="What is a bombard?", max_tokens=300)
print(answer)

Docker Usage

Knowledgegpt can also be deployed using Docker for a more streamlined setup:

docker build -t knowledgegptimage .
docker run -p 8888:8888 knowledgegptimage

Contribution to the Project

Knowledgegpt welcomes contributions. Users can participate by opening issues, forking the repository, creating new branches, making changes, and finally making pull requests.

Features and Roadmap

The project successfully implements extracting knowledge from various sources like the internet and local files, including audio and video formats. Looking forward, there are plans to add features like vector database integration, better interface support, expanded language capabilities, and improved documentation and logging systems.

knowledgegpt is committed to becoming a more robust AI system, with continuous efforts to enhance its functionalities and support more complex information retrieval and prompt generation tasks.