law-cn-ai - Integration of AI with Legal Document Processing and Custom ChatGPT Solutions

Project Introduction: Law-CN-AI

Overview

Law-CN-AI is a project designed to function as an AI-powered legal assistant. It sources legal documents from LawRefBook/Laws and utilizes a specialized project template available at supabase-community/nextjs-openai-doc-search. This initiative transforms Markdown files into a custom context for generating prompts using OpenAI's text completion technology.

Fun Features

The project offers more than just legal assistance, providing links to various interesting and useful tools:

MagickPen: An intelligent writing assistant available at magickpen.com.
TeachAnything: An AI encyclopedia accessible via teach-anything.com.
BetterPrompt: A prompt generator found at better.avatarprompt.net.
OpenL: An AI translation expert, which is part of the offerings at openl.io.

Deployment

Deploying Law-CN-AI is streamlined with Vercel. With the Supabase integration, essential environment variables will be set up automatically, alongside the configuration of your database schema. The only requirement is to configure your OPENAI_KEY, and you're set to go.

Deploy with Vercel

A helpful tutorial on deployment and setup by GoJun can be accessed here.

Technical Details

Creating a customized ChatGPT using Law-CN-AI involves these steps:

Preprocess the knowledge base (Markdown files in your pages folder).
Store embedded vectors within PostgreSQL using the pgvector extension.
Perform a vector similarity search to find relevant content.
Inject the content into OpenAI GPT-3 for text completion, streaming the response back to the user.

Build Time

At build time, the system preprocesses pages and creates embeddings. These embeddings are stored in a database powered by pgvector. Here is the process visualized:

sequenceDiagram
    participant Vercel
    participant DB (pgvector)
    participant OpenAI (API)
    loop 1. Preprocess Knowledge Base
        Vercel->>Vercel: Split .mdx pages into sections
        loop 2. Create and Store Embeddings
            Vercel->>OpenAI (API): Create embeddings for page sections
            OpenAI (API)->>Vercel: Embedding vectors (1536)
            Vercel->>DB (pgvector): Store page section embeddings
        end
    end

Additionally, a checksum is generated to ensure embeddings are updated only when files have changed.

Runtime

During runtime, when a user makes a query, the following sequence occurs:

sequenceDiagram
    participant Client
    participant Edge Function
    participant DB (pgvector)
    participant OpenAI (API)
    Client->>Edge Function: { query: lorem ipsum }
    critical 3. Perform Vector Similarity Search
        Edge Function->>OpenAI (API): Create embedding for query
        OpenAI (API)->>Edge Function: Embedding vector (1536)
        Edge Function->>DB (pgvector): Vector similarity search
        DB (pgvector)->>Edge Function: Relevant document content
    end
    critical 4. Inject Content into Prompt
        Edge Function->>OpenAI (API): Completion request with query + docs
        OpenAI (API)-->>Client: text/event-stream: Completion response
    end

Files handling these processes are SearchDialog (client-side) and vector-search (edge function).

Local Development

To set up locally:

Copy environment file: cp .env.example .env
Set OPENAI_KEY in the new .env file.

Launch Supabase using Docker:

npx supabase start

Start the Next.js application:

pnpm dev

This comprehensive setup enables developers to run Law-CN-AI locally and explore its capabilities as an AI-powered legal assistant.