chatdocs - Improve Document Handling with AI and Extensible Features for Offline Use

ChatDocs Project Introduction

Overview

ChatDocs is an innovative tool designed to interact with your documents offline using artificial intelligence. This project emphasizes privacy, as all processing happens locally on your machine, ensuring no data leaves your system. An internet connection is only necessary for installing the tool and downloading certain AI models.

ChatDocs derives inspiration from PrivateGPT but extends its capabilities with additional features and functionalities.

Key Features

Model Support: ChatDocs supports various AI models, including GGML/GGUF models via CTransformers, 🤗 Transformers models, and GPTQ models.
User Interfaces: The tool boasts a web-based user interface (UI) that enhances user interaction and also provides a command-line interface for users who prefer text-based commands.
Configuration: Extensive options for customization are available through a chatdocs.yml configuration file, allowing users to tailor the tool to their specific needs.
Document Compatibility: ChatDocs can handle a wide range of document types, including but not limited to CSV, Word, EverNote, Email, EPub, HTML, Markdown, PDFs, and PowerPoint files.
GPU Support: The tool can leverage GPU resources for improved performance when dealing with large datasets or complex models.

Installation

Getting started with ChatDocs is straightforward. First, the tool is installed using the Python package manager with:

pip install chatdocs

Once installed, users need to download the required AI models:

chatdocs download

After these steps, ChatDocs can be used completely offline.

How to Use

To begin using ChatDocs, users add their document directory:

chatdocs add /path/to/documents

Processed documents are stored in a local directory named db by default. Users can then interact with their documents through the web UI by visiting http://localhost:5000 in a browser, or use the command-line with:

chatdocs chat

Configuration Options

The chatdocs.yml file is the heart of ChatDocs configuration. Users can change various settings, such as the embeddings model by specifying:

embeddings:
  model: hkunlp/instructor-large

Similar configurations are available for CTransformers and 🤗 Transformers models. Users can specify a model type and location and adjust settings like GPU usage for performance enhancement.

GPU Utilization

For tasks requiring higher computational power, ChatDocs offers GPU support:

Embeddings: Enable GPU by specifying the device type in the configuration.
```
embeddings:
  model_kwargs:
    device: cuda
```
CTransformers: For CTransformers models, GPU layers can be configured.
```
ctransformers:
  config:
    gpu_layers: 50
```
Transformers: Specify the device index to use GPU with 🤗 Transformers models.
```
huggingface:
  device: 0
```

To use GPU, users might need to install additional components, such as the correct version of PyTorch with CUDA capabilities.

Conclusion

ChatDocs represents a robust and flexible solution for offline document interaction using AI. With its privacy-first approach and comprehensive support for various document and model types, it serves as a powerful tool for users needing AI-driven document analysis and interaction.