Introduction to LLaVA C++ Server
The LLaVA C++ Server is a straightforward API server designed to work with the LLaVA implementation from the llama.cpp
project. The server is developed by Bart Trzynadlowski in 2023, and its main goal is to provide an interface for interacting with models that use LLaVA in a streamlined manner.
Getting Started
To use the LLaVA C++ Server, one must first obtain specific model files. These files include ggml-model-*.gguf
and mmproj-model-f16.gguf
, which can be downloaded from a designated repository on Hugging Face. Once the files are available, launching the server is quite simple:
Run the following command:
bin/llava-server -m ggml-model-q5_k.gguf --mmproj mmproj-model-f16.gguf
This command initializes the server, making it available at localhost:8080
. Users can customize the server's address by using the --host
and --port
options. Additionally, enabling HTTP logging can be achieved with the --log-http
flag. After the server is up and running, interactions can occur through a web browser at the specified local address.
Interacting with the API
The core endpoint for LLaVA functionality resides at /llava
. When making a request to this endpoint, the following parameters are necessary within the request body:
- user_prompt (string): A mandatory field that specifies the query or prompt, such as "what is this?".
- image_file (file): This is also a required parameter, where the image data must be provided in binary form.
- system_prompt (string): This is an optional field used for system-level prompts.
These parameters enable the server to process requests effectively and deliver accurate responses based on the input.
Building the Server
The project integrates with other repositories, namely llama.cpp and cpp-httplib, which are included as submodules. To set up the server environment:
First, ensure that the submodules are initialized and updated:
git submodule init
git submodule update
Then, to build the server, execute:
make
While testing has primarily been conducted on macOS, the server is expected to function on any platform where llama.cpp
can successfully build.
By following these steps, users can effectively deploy and utilize the LLaVA C++ Server for various applications requiring LLaVA model interactions.