llama-cpp-wasm

Overview

llama-cpp-wasm is an innovative project that brings the capabilities of the llama.cpp library into the realm of the web through WebAssembly (Wasm). This initiative is supported by Tangled Group, Inc, and it allows developers to use llama.cpp directly in web environments, leveraging the power and versatility of Wasm. This project is particularly exciting for anyone interested in deploying machine learning models in browser applications.

Online Demos

For those eager to see llama-cpp-wasm in action, there are online demos available at this link. These demos provide a hands-on look at how llama-cpp-wasm can operate within a browser.

Building the Project

To get started with llama-cpp-wasm, users need to build the project. The process begins with cloning the repository using Git:

git clone https://github.com/tangledgroup/llama-cpp-wasm.git
cd llama-cpp-wasm

Once inside the project directory, there are scripts available to build two versions of the library: single-threaded and multi-threaded.

./build-single-thread.sh for the single-threaded build.
./build-multi-thread.sh for the multi-threaded build.

After running these scripts, the built versions of llama.cpp will be located in the dist/llama-st (single-threaded) and dist/llama-mt (multi-threaded) directories.

Deployment

Deployment of the llama-cpp-wasm library is straightforward. Once built, the contents of either dist/llama-st or dist/llama-mt can be integrated into your project as a standard JavaScript library or module.

Here’s a simple example of how this might look in a basic HTML page:

index.html

<!DOCTYPE html>
<html lang="en">
  <body>
    <label for="prompt">Prompt:</label>
    <br/>
    <textarea id="prompt" name="prompt" rows="25" cols="80">...</textarea>
    <br/>
    <label for="result">Result:</label>
    <br/>
    <textarea id="result" name="result" rows="25" cols="80"></textarea>
    <br/>
    <script type="module" src="example.js"></script>
  </body>
</html>

example.js

import { LlamaCpp } from "./llama-mt/llama.js"; 

const onModelLoaded = () => { 
  console.debug('model: loaded');
  const prompt = document.querySelector("#prompt").value;
  document.querySelector("#result").value = prompt;

  app.run({
    prompt: prompt,
    ctx_size: 4096,
    temp: 0.1,
    no_display_prompt: true,
  });
}

const onMessageChunk = (text) => {
  console.log(text);
  document.querySelector('#result').value += text;
};

const onComplete = () => {
  console.debug('model: completed');
};

const models = [
  // List of model URLs
];

const model = models[2]; // Example model selection

const app = new LlamaCpp(
  model,
  onModelLoaded,          
  onMessageChunk,       
  onComplete,
);

Running the Examples

Before running any example, it's necessary to create a self-signed certificate to enable HTTPS on your local server:

openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout key.pem -out cert.pem

For running a single-threaded example, use:

npx http-server -S -C cert.pem

To run a multi-threaded example, you first need to prepare the environment:

Copy docs/server.js to your working directory.
Install Express.js:
```
npm i express
```
Run the server:
```
node server.js
```

You can then view the application in your browser by visiting https://127.0.0.1:8080/.

This setup provides an intuitive way to explore the capabilities of llama-cpp-wasm, offering both a single-thread and multi-thread experience of building and deploying complex models directly in the browser with Wasm.