llava-cpp-server - C++ API Server for Seamless LLaVA Integration

Introduction to LLaVA C++ Server

The LLaVA C++ Server is a straightforward API server designed to work with the LLaVA implementation from the llama.cpp project. The server is developed by Bart Trzynadlowski in 2023, and its main goal is to provide an interface for interacting with models that use LLaVA in a streamlined manner.

Getting Started

To use the LLaVA C++ Server, one must first obtain specific model files. These files include ggml-model-*.gguf and mmproj-model-f16.gguf, which can be downloaded from a designated repository on Hugging Face. Once the files are available, launching the server is quite simple:

Run the following command:

bin/llava-server -m ggml-model-q5_k.gguf --mmproj mmproj-model-f16.gguf

This command initializes the server, making it available at localhost:8080. Users can customize the server's address by using the --host and --port options. Additionally, enabling HTTP logging can be achieved with the --log-http flag. After the server is up and running, interactions can occur through a web browser at the specified local address.

Interacting with the API

The core endpoint for LLaVA functionality resides at /llava. When making a request to this endpoint, the following parameters are necessary within the request body:

user_prompt (string): A mandatory field that specifies the query or prompt, such as "what is this?".
image_file (file): This is also a required parameter, where the image data must be provided in binary form.
system_prompt (string): This is an optional field used for system-level prompts.

These parameters enable the server to process requests effectively and deliver accurate responses based on the input.

Building the Server

The project integrates with other repositories, namely llama.cpp and cpp-httplib, which are included as submodules. To set up the server environment:

First, ensure that the submodules are initialized and updated:

git submodule init
git submodule update

Then, to build the server, execute:

make

While testing has primarily been conducted on macOS, the server is expected to function on any platform where llama.cpp can successfully build.

By following these steps, users can effectively deploy and utilize the LLaVA C++ Server for various applications requiring LLaVA model interactions.