llama2-webui - Run Diverse Generative Models Seamlessly Across Platforms

Introduction to Llama2-WebUI

The llama2-webui project is a versatile tool designed to facilitate running Llama 2 models using a Gradio web interface across different platforms including Linux, Windows, and macOS. This project supports a range of Llama 2 models, such as 7B, 13B, and 70B parameter models, as well as specialized versions like GPTQ, GGML, and GGUF. It is configured to operate in both 8-bit and 4-bit modes to accommodate varied computational resources.

Key Features

Supported Models

The project provides support for several versions of Llama 2, including:

Llama-2-7b, Llama-2-13b, and Llama-2-70b, suitable for various tasks depending on the model size and application requirements.
Specialized models such as Llama-2-GPTQ, ideal for resource-constrained environments, and Llama-2-GGML, optimized for CPU.
CodeLlama and its variations for specialized tasks like code completion and code generation.

Backend Support

Llama2-webui utilizes different backends to ensure optimal performance:

Transformers for deep learning tasks with support for 8-bit and 4-bit model operations.
Bitsandbytes for efficient, memory-conserving inference through 8-bit operations.
AutoGPTQ offering an effective 4-bit inference solution.
llama.cpp, a backend designed for environments lacking GPU support, optimizing CPU usage.

Demonstrations and Usability

The project includes examples for running Llama2 models on minimal hardware setups like a MacBook Air or free Colab T4 GPU. It also supports a local backend via the llama2-wrapper, enhancing generative applications and agent performance. This flexibility is crucial for developers aiming to integrate Llama 2 models in diverse computational environments.

OpenAI Compatible API

For developers familiar with the OpenAI API, Llama2-webui presents a compatible API layer, simplifying integration with existing applications and increasing the versatility of the tool.

Installation Guide

Installation is straightforward, accessible through PyPI or directly from the source code on GitHub. This ensures ease of setup and flexibility for users with different installation preferences or technical requirements.

PyPI Method

Simply run the following command:

pip install llama2-wrapper

From Source

Clone the repository and install dependencies:

git clone https://github.com/liltom-eth/llama2-webui.git
cd llama2-webui
pip install -r requirements.txt

Troubleshooting

For users with older NVIDIA GPUs, downgrading bitsandbytes might be necessary. Likewise, Windows users may need a special installation for bitsandbytes.

Usage

Running the Chat UI

To start using the chat interface, execute:

python app.py

This will initialize the default environment configuration using llama.cpp as the backend for running models like llama-2-7b-chat.ggmlv3.q4_0.bin.

Starting the Code Llama UI

For tasks oriented towards code assistance, the Code Llama UI offers an interactive environment suited to developers. Models dedicated to code completion and instruction follow user prompts effectively.

Advanced Configurations

The app allows model and backend configurations within an .env file, offering users the flexibility to adjust according to their specific requirements and hardware capabilities.

Benchmarking and Performance

A benchmark script, benchmark.py, is available to measure the performance of the models on different hardware setups. The script provides insights into speed and memory use, aiding users in optimizing model performance based on their system specifications.

Model Download

The Llama 2 models are available for download through various methods, including direct links from Hugging Face or repository cloning with Git Large File Storage (LFS). Users must ensure access is requested for certain models from Meta AI.

Tips for Optimized Performance

Running high-parameter models like Llama-2 efficiently requires significant computational resources, particularly GPU memory. The project provides guidelines on utilizing multiple GPUs, memory-efficient modes, and backend optimizations to improve performance effectively.

In summary, Llama2-webui is a comprehensive tool bridging advanced AI models with user-friendly interfaces and versatile computational support, making sophisticated model deployment accessible from various platforms. This project empowers developers and researchers to harness the capabilities of Llama 2 models in diverse environments tailored to their specific needs.