llama-gpt - Private Self-hosted Chatbot with Comprehensive Offline Support

LlamaGPT: An Overview

Introduction

LlamaGPT is a self-hosted, offline chatbot that offers users a private ChatGPT-like experience powered by Llama 2. It's designed to ensure that no data leaves your device, making it a completely private solution. An exciting new feature is the support for Code Llama models and Nvidia GPUs, expanding the tool's capabilities and performance.

Supported Models

LlamaGPT supports several models, tailored for different needs:

Nous Hermes Llama 2 7B Chat: This model has a size of 7 billion parameters, requiring 3.79GB for download and 6.29GB of memory.
Nous Hermes Llama 2 13B Chat: With a model size of 13 billion parameters, it necessitates a 7.32GB download and 9.82GB of memory.
Nous Hermes Llama 2 70B Chat: The largest with 70 billion parameters, it takes 38.87GB to download and needs 41.37GB of memory.
Code Llama 7B Chat: This model is 7 billion in size, with a 4.24GB download and 6.74GB of memory required.
Code Llama 13B Chat: Featuring 13 billion parameters, it's downloadable in 8.06GB and needs 10.56GB of memory.
Phind Code Llama 34B Chat: This model is 34 billion in size, with a download of 20.22GB and requiring 22.72GB of memory.

Support for even more custom models is planned for future updates.

Installation Options

LlamaGPT can be installed in several environments, making it versatile and adaptable to various system configurations:

On an umbrelOS Home Server

Installing LlamaGPT on an umbrelOS home server is straightforward. It involves a single click through the Umbrel App Store, making it accessible even to those with minimal technical expertise.

On M1/M2 Mac

For Mac users, particularly those using M1 or M2 models, installation involves:

Ensuring Docker and Xcode are installed.
Cloning the LlamaGPT repository and navigating to it.
Running the ./run-mac.sh --model 7b command to start with the default model, accessible via http://localhost:3000.

Options exist to switch models based on your requirements by altering the command slightly (e.g., changing 7b to 13b or 70b).

Using Docker

If you're operating on any x86 or arm64 system, you can install LlamaGPT using Docker by executing:

Clone the repository and navigate to its directory.
Use ./run.sh --model 7b to initiate the process.
For systems equipped with Nvidia GPUs, the --with-cuda flag can be added for enhanced performance.

Kubernetes Deployment

For those with Kubernetes clusters, LlamaGPT deployment is streamlined:

Establish a namespace.
Apply Kubernetes manifests available in the /deploy/kubernetes directory.
Expose the service through standard practices.

OpenAI-Compatible API

LlamaGPT comes with an OpenAI-compatible API accessible at http://localhost:3001. Documentation is available at this endpoint, providing users with a seamless integration experience.

Performance Benchmarks

Performance testing of LlamaGPT models reveals various generation speeds across different hardware configurations. An M1 Max MacBook Pro, for instance, reaches speeds of 54 tokens per second, while other setups like Umbrel Home clock in at 2.7 tokens per second.

Conclusion

LlamaGPT offers a powerful and private AI chatbot experience. With its range of supported models, varied installation options, and ongoing developments, it stands out as a flexible solution for users interested in maintaining privacy and control over data while leveraging advanced AI technologies. Whether on a personal device or within a larger network infrastructure, LlamaGPT provides an accessible platform for AI-driven interactions.