chatWeb - AI-Powered Tool for Text Extraction and Summarization

Introduction to ChatWeb

ChatWeb is an innovative project designed to simplify how users interact with various forms of digital text content by extracting information and providing concise summaries. It enables users to effortlessly crawl webpages or extract text from files like PDF, DOCX, and TXT, then generates an embedded summary of the gathered data. Users can interact with this content by asking questions to which ChatWeb provides accurate responses based on the extracted information. This is achieved using advanced tools like the chatAPI and embeddingAPI along with a sophisticated vector database, all based on the powerful GPT-3.5 technology.

Basic Principle

ChatWeb operates on principles akin to existing technologies, such as chatPDF and automated customer service AI systems. Here’s an overview of its primary functions:

Crawling and Extraction: ChatWeb crawls webpages and extracts text from various file formats.
Embedding and Vectorization: It utilizes GPT3.5’s embedding API to convert text into vectors.
Summarization: By calculating similarity scores between paragraph vectors and the overall text vector, it produces a cohesive summary.
Vector Database Storage: The vector-text mappings are then stored in a vector database for easy access.
Query Processing: From user input, keywords are extracted to generate vectors, which are compared to those in the database.
Response Generation: Using the relevant content identified, GPT-3.5’s chat API formulates a response to the user’s question, breaking through the typical token limits seen in such systems.

An enhancement in this project is the use of keywords rather than direct user questions to improve the accuracy of relevant text retrieval.

Getting Started

To start using ChatWeb, users can choose between manual installation or running the project in a Docker container. Here’s how each method works:

Manual Installation

Install Python3: Ensure it's available on your system.
Download Repository: Use the command git clone https://github.com/SkywalkerDarren/chatWeb.git.
Configuration: Navigate to the directory with cd chatWeb, then copy config.example.json to config.json. Input your OpenAI API key in config.json.
Dependencies: Install necessary packages using pip3 install -r requirements.txt.
Launch: Start the application by executing python3 main.py.

Utilizing Docker

Build Container: Use docker-compose build.
Set Configuration: Follow similar steps as in the manual setup for config.json, ensuring OpenAI keys are set.
Run the Container: Execute docker-compose up.
Access Application: Open it via browser at http://localhost:7860.

Additional Settings

ChatWeb offers various configurations for a personalized setup:

Language: Specify language in config.json.
Mode: Choose between console, api, or webui.
Stream Mode: Activate by setting use_stream to true.
Response Temperature: Adjust the temperature from 0 to 1 to control response creativity.
OpenAI Proxy: Include proxy settings within config.json.

PostgreSQL Support

Optionally, users can enable integration with PostgreSQL by setting use_postgres to true in config.json, installing PostgreSQL and the pgvector plugin, then ensuring database access dependencies are installed with pip3.

Example Usage

Users can enter a URL or document path to retrieve content, which ChatWeb processes, summarizes, and then responds to user queries.

Future Plans

The project aims to continually evolve, with ongoing improvements such as additional features and enhanced support capabilities.

Despite being new, ChatWeb has shown promise, as reflected in its growing popularity and star history on repositories like GitHub. With such robust features and active development, it is poised to become indispensable for users needing efficient text processing and analysis.