Swiss Army Llama
Introduction
The Swiss Army Llama is a versatile tool designed to simplify and enhance the process of working with local large language models (LLMs). By leveraging FastAPI, it provides an easy-to-use interface that exposes convenient REST endpoints for various tasks, like obtaining text embeddings and completions using different LLMs. The project focuses on automating the process of obtaining embeddings for multiple common document types, including PDFs, Word files, and more. It even supports audio file submission, automatically transcribing it using the Whisper model before processing the text for embeddings.
The efficient caching of these embeddings in SQLite avoids redundant calculations, enhancing performance. Optional RAM Disks are supported to accelerate the loading of multiple LLMs, adding to the system's speed without the need for manual management. The setup process is straightforward, presenting users with a comprehensive suite of LLM-related tools accessible through an intuitive Swagger UI. This makes integration into applications seamless with minimal configuration needed.
Additional features include the ability to compute semantic similarity between text strings using a high-performance Rust-based library. The service provides multiple similarity measures and supports semantic search across cached embeddings using FAISS vector searching. Built-in cosine similarity can be extended with advanced search functionality for more precise results.
Multiple embedding pooling methods are available to combine token-level embedding vectors into a single, fixed-length vector adaptable to any text length. These methods include options like mean pooling, SVD, ICA, and more.
Features
-
Text Embedding Computation: Generates embeddings using various pre-trained models efficiently with llama_cpp.
-
Embedded Caching: Caches computed embeddings in SQLite for efficient retrieval, minimizing unnecessary recalculations.
-
Advanced Similarity Measures: Offers extensively optimized similarity measurements using advanced algorithms for highly accurate results.
-
Document File Processing: Accepts a wide range of file types including text documents and images, automatically handling OCR for scanned texts.
-
Audio Transcription and Embedding: Transcribe audio files and compute sentence embeddings using the Whisper model, returning detailed embedding data.
-
RAM Disk Usage: Optional support for RAM Disk accelerates access to models by managing RAM storage automatically.
-
Scalable and Concurrent Design: Built with FastAPI supporting concurrent requests and parallel inference for efficient resource usage.
-
Flexible Response Configurations: Allows configuration of response formats and other parameters to suit individual requirements.
-
Real-Time Log Monitoring: The browser-based log viewer allows users to monitor application logs in real-time without direct server access.
-
Multiple Language Model Support: Supports various models and similarity measures to tailor the service to specific user needs.
Getting Started
To get started with Swiss Army Llama on a fresh Ubuntu 22+ machine, users can clone the repository and follow the provided setup commands to install necessary dependencies and the application itself, either using Docker or a native Python virtual environment setup.
Configuration
The application features various configuration options that can be adjusted through an environment file to customize service behaviors, such as parallel processing settings, embedded models, and network configurations.
Contribution
Swiss Army Llama is open to community contributions, welcoming pull requests to help expand its capabilities and establish it as a standard library for LLM tasks.
License
The project is licensed under the MIT License, encouraging open collaboration and sharing within the community.
In summary, Swiss Army Llama aims to be a multi-functional toolset that offers comprehensive solutions for working efficiently with LLMs, providing integration possibilities across a wide array of applications with minimal setup and configuration efforts.