Introduction to ChatWeb
ChatWeb is an innovative project designed to simplify how users interact with various forms of digital text content by extracting information and providing concise summaries. It enables users to effortlessly crawl webpages or extract text from files like PDF, DOCX, and TXT, then generates an embedded summary of the gathered data. Users can interact with this content by asking questions to which ChatWeb provides accurate responses based on the extracted information. This is achieved using advanced tools like the chatAPI and embeddingAPI along with a sophisticated vector database, all based on the powerful GPT-3.5 technology.
Basic Principle
ChatWeb operates on principles akin to existing technologies, such as chatPDF and automated customer service AI systems. Here’s an overview of its primary functions:
- Crawling and Extraction: ChatWeb crawls webpages and extracts text from various file formats.
- Embedding and Vectorization: It utilizes GPT3.5’s embedding API to convert text into vectors.
- Summarization: By calculating similarity scores between paragraph vectors and the overall text vector, it produces a cohesive summary.
- Vector Database Storage: The vector-text mappings are then stored in a vector database for easy access.
- Query Processing: From user input, keywords are extracted to generate vectors, which are compared to those in the database.
- Response Generation: Using the relevant content identified, GPT-3.5’s chat API formulates a response to the user’s question, breaking through the typical token limits seen in such systems.
An enhancement in this project is the use of keywords rather than direct user questions to improve the accuracy of relevant text retrieval.
Getting Started
To start using ChatWeb, users can choose between manual installation or running the project in a Docker container. Here’s how each method works:
Manual Installation
- Install Python3: Ensure it's available on your system.
- Download Repository: Use the command
git clone https://github.com/SkywalkerDarren/chatWeb.git
. - Configuration: Navigate to the directory with
cd chatWeb
, then copyconfig.example.json
toconfig.json
. Input your OpenAI API key inconfig.json
. - Dependencies: Install necessary packages using
pip3 install -r requirements.txt
. - Launch: Start the application by executing
python3 main.py
.
Utilizing Docker
- Build Container: Use
docker-compose build
. - Set Configuration: Follow similar steps as in the manual setup for
config.json
, ensuring OpenAI keys are set. - Run the Container: Execute
docker-compose up
. - Access Application: Open it via browser at
http://localhost:7860
.
Additional Settings
ChatWeb offers various configurations for a personalized setup:
- Language: Specify
language
inconfig.json
. - Mode: Choose between
console
,api
, orwebui
. - Stream Mode: Activate by setting
use_stream
totrue
. - Response Temperature: Adjust the
temperature
from 0 to 1 to control response creativity. - OpenAI Proxy: Include proxy settings within
config.json
.
PostgreSQL Support
Optionally, users can enable integration with PostgreSQL by setting use_postgres
to true
in config.json
, installing PostgreSQL and the pgvector plugin, then ensuring database access dependencies are installed with pip3
.
Example Usage
Users can enter a URL or document path to retrieve content, which ChatWeb processes, summarizes, and then responds to user queries.
Future Plans
The project aims to continually evolve, with ongoing improvements such as additional features and enhanced support capabilities.
Despite being new, ChatWeb has shown promise, as reflected in its growing popularity and star history on repositories like GitHub. With such robust features and active development, it is poised to become indispensable for users needing efficient text processing and analysis.