LARS - Local Solution for Large Language Models with Advanced Citation Features

Introduction to LARS: The LLM & Advanced Referencing Solution

LARS stands for the Large Language Model & Advanced Referencing Solution. This innovative application allows users to run large language models (LLMs) locally on their devices. A unique feature of LARS is its ability to upload personalized documents and engage in conversations where the LLM answers queries by referencing the uploaded content. This helps improve the accuracy of the responses and reduces AI-generated errors or "hallucinations," a method known as Retrieval Augmented Generation (RAG).

Key Features of LARS

Advanced Citation Capabilities

One of the standout features of LARS is the provision of detailed citations with every response generated by an LLM. These citations include document names, page numbers, highlighted text, and related images. Users also have access to a document reader right within the response window, enabling them to scroll through the cited documents and download highlighted sections as PDFs.

Supports Multiple File Formats

LARS supports a wide range of file formats, making it extremely versatile. Users can upload:

PDFs
Word documents, including DOC, DOCX, ODT, RTF, and TXT
Excel spreadsheets like XLS, XLSX, ODS, and CSV
PowerPoint presentations, such as PPT, PPTX, and ODP
Image files including BMP, GIF, JPG, PNG, SVG, and TIFF
HTML files and RTF

Enhanced User Interactions

Users can continue their exploration with follow-up questions thanks to LARS's memory capabilities. Past conversations can be resumed, as LARS saves full chat histories. Moreover, the option to enable or disable RAG and modify system prompts is available in the settings.

Easy Integration and Adaptation

LARS allows users to introduce new LLMs by dragging and dropping them into the system, with built-in prompt templates for popular LLMs such as Llama3, Llama2, ChatML, and others. This flexibility ensures that users can tailor their experience to suit their preferences.

Optimal Performance with Llama.cpp Backend

Utilizing a pure llama.cpp backend, LARS requires no additional frameworks, Python bindings, or abstractions. Users can seamlessly upgrade to newer versions of llama.cpp to enhance functionality independent of LARS. For users with NVIDIA GPUs, CUDA-accelerated inferencing delivers faster processing times.

Advanced LLM Settings and Embedding Models

LARS provides intricate settings for LLM temperature, top-k, top-p, and other configurations, offering users substantial control over their LLM interaction. Moreover, it supports four embedding models, which include sentence-transformers and OpenAI models, allowing users to choose their preferred text representation.

Comprehensive Document Management

The application features a Sources UI where users can see details of uploaded documents and their vectorization information. A reset button allows users to clear and reset their vectorDB.

Versatile Text Extraction Methods

LARS offers three text extraction methods: a purely local option and two Azure-based OCR options, which support scanned documents with enhanced accuracy. A custom parser enhances table-data extraction, preventing text duplication by accounting for spatial coordinates.

Demonstration and Additional Resources

For those interested in seeing these features in action, a demonstration video is available. It showcases LARS's capabilities and can be viewed here.

In summary, LARS stands out as a dynamic open-source solution for those needing a robust system for running local LLMs with high-level referencing support. Its user-centric design and comprehensive feature set make it a valuable tool for anyone working with extensive document collections and advanced language models.