react-llm - Execute Large Language Models in Browser with WebGPU-Accelerated React Hooks

Introduction to react-llm

react-llm is a set of headless React Hooks designed to enable developers to run large language models (LLMs) directly within their browser, enhanced by the power of WebGPU. One of the significant advantages of this library is its simplicity—offering features that allow developers to easily integrate LLMs using a straightforward hook like useLLM().

Features and Advantages

Support for Vicuna 7B: Vicuna 7B is supported, enabling sophisticated natural language processing capabilities right in the browser.
Customized Interaction: It allows the use of custom system prompts and role names, such as "user:" and "assistant:", granting flexibility in tailoring interaction.
Configuration Options: Developers can set options for text completion, such as maximum tokens and specific sequences that indicate when to stop.
Data Privacy: The model runs entirely within the browser, ensuring no data leaves the user’s device. This is achieved with the help of WebGPU for acceleration.
UI Flexibility: React Hooks are designed to be UI-agnostic, allowing developers to bring in their custom interfaces.
Persistent Conversation Storage: Conversations are stored persistently in the browser’s storage. Hooks are available for both loading and saving these conversations.
Model Caching: The model is cached locally to reduce load times for subsequent uses.

Installation and Packages

To get started with installing react-llm, you can use the following command:

npm install @react-llm/headless

The repository includes several packages:

@react-llm/model: This comprises the LLM model and tokenizer compiled for browser use.
@react-llm/retro-ui: A retro-themed UI that complements the hooks.
@react-llm/extension: A Chrome Extension leveraging these hooks.
@react-llm/headless: The core headless React Hooks enabling browser-based LLM operations.

Using the useLLM API

The useLLM API presents a range of functionalities:

Model Operations: Initialization (init) and generation (send) of the model, managing message receipts (onMessage), and configuring response handlers (setOnMessage).
State Monitoring: Check on model initialization progress (loadingStatus), generation status (isGenerating), and GPU device info (gpuDevice).
Customization and Management: Configure role names, manage conversations, and utilize persistent storage for streamlined conversation handling.

Implementation Example

For a detailed demo on how to implement these hooks, developers can refer to the retro-ui package. Installation and execution of the demo can be facilitated through:

cd packages/retro-ui
pnpm install
pnpm dev

How It Works

This library encapsulates React Hooks to seamlessly manage interactions with LLMs in the browser, employing Vicuna 13B, which is designed for efficient processing:

SentencePiece Tokenizer: Integrated to function within browsers with the aid of Emscripten.
Vicuna 7B: Reformatted into Apache TVM format for execution.
Apache TVM and MLC Relax: Tools compiled for browser operation utilizing Emscripten.
WebWorker Integration: Executes the model independently of the main thread to ensure efficient performance, packaged within the library.

Resources such as the model, tokenizer, and TVM runtime are sourced from a CDN like Hugging Face, and they are cached locally to speed up reuse.

Licensing

The library is made available under the MIT license, with specific components like packages/headless/worker/lib/tvm licensed under Apache 2.0.

react-llm paves the way for developers to leverage advanced LLM capabilities directly within browser environments, offering a robust blend of flexibility, efficiency, and privacy.