LLM_Web_search - Facilitate Web Searches for Local LLMs via Command-Based Integration

Introducing the LLM_Web_Search Project

The LLM_Web_Search project is a fascinating development that grants local Language Learning Models (LLMs) the capability to search the web, adding a layer of intelligence and relevance to model outputs that is unparalleled. At its core, the project allows local LLMs to search the internet by generating specific commands within their outputs. Once these commands are detected through the use of regular expressions, a web search is triggered using the DuckDuckGo Search library, which retrieves a range of search results. The project then employs a combination of advanced technologies, including a dense embedding model and methods like Okapi BM25 or SPLADE, to filter and extract pertinent information from these results, appending these findings to the LLM's output.

Installation Process

For developers eager to integrate this extension, the installation is straightforward:

Access the "Session" tab on the web UI and use the "Install or update an extension" function to download the latest code version.
To install dependencies, you have two choices:
- The Easy Way: Run the update_wizard script from within the text-generation-webui folder and opt for "Install/update extensions requirements." This uses pip for installation, utilizing the faiss-cpu package, though compatibility is not universally guaranteed.
- The Safe Way: Manually update your conda environment with the necessary dependencies from the oobabooga text-generation-webui project.
Start the Web UI and activate the extension from the session tab, optionally using the command python server.py --extension LLM_Web_search.

After a successful setup, a "LLM Web Search" tab should be visible in the web UI.

Usage Guidelines

Integrating and using the web search functionality in your LLM involves several steps:

Model Setup: Load your model and select an appropriate instruction template.
Search Command: Ensure the query syntax in the system matches the regex pattern in use.
Parameter Configuration: Select a parameter preset that complements your model's requirements.
Engagement: Start using either "chat-instruct" or "instruct" modes for interaction.

Custom Regular Expressions

Users have the flexibility to define custom regex patterns to refine how queries are extracted from the model’s output. The default regex is Search_web\("(.*)"\), which identifies search commands and queries them via a capture group technique for flexibility.

Experimental Web Page Reading

There is experimental functionality available for extracting full text from web pages. Users can employ regex like Open_url\("(.*)"\) for this purpose, though it may require adjustments due to LLM memory constraints.

Search Backends

DuckDuckGo: The default search backend employed by the project.
SearXNG: A more advanced option is available using SearXNG, which can process queries with detailed parameters, though it requires configuring to support JSON format results.

Keyword Retrieval Techniques

The project supports two keyword retrieval methods:

Okapi BM25: The standard choice for document retrieval using CPU processing.
SPLADE: An advanced retrieval method utilizing query expansion, suited for setups with available VRAM.

Chunking Methods

To manage text processing, the project supports two chunking methods:

Character-based Chunking: This splits text into fixed-size segments rapidly without needing GPU power.
Semantic Chunking: A sophisticated approach that segments text based on semantic content, providing better results for natural language processing at a trade-off for speed and is GPU-intensive.

Recommended Models

For users with GPU memory constraints (≤ 12 GB VRAM), models like Llama-3.1-8B-instruct or gemma-2-9b-it are recommended due to their compatibility and efficiency.

This project is a step forward in enabling LLMs to not only understand but actively retrieve and integrate web information, making AI interactions more informed and contextually rich.