ollama-grid-search - Enhance LLM Evaluation with Automated Grid Search and A/B Testing Tool

Ollama Grid Search and A/B Testing Desktop App

The Ollama Grid Search project is a desktop application developed using Rust, which provides a streamlined interface for evaluating large language models (LLMs), prompts, and model parameters. The main objective of this tool is to facilitate the process of selecting the most suitable models and configurations for specific use cases by enabling users to explore various combinations and visualize the outcomes.

Purpose

This project is designed to automate the evaluation process of LLM models, prompts, and inference parameters. It enables users to easily test different combinations and visually inspect the results to identify the best setup for their needs. The tool works in conjunction with Ollama, a platform that must be installed and serving endpoints either locally or on a remote server.

Quick Example

In a typical use-case scenario, the user might want to test a simple prompt across two different models while experimenting with two temperature settings, such as 0.7 and 1.0. The experiment interface allows users to compare results quickly and efficiently.

Main Screenshot

Installation

To install the Ollama Grid Search tool, users should visit the project's releases page.

Features

Model Access: Automatically retrieves models from local or remote Ollama servers.
Iteration and Testing: Allows testing of different models, prompts, and parameters to generate inferences concurrently.
Visual Comparison: Provides A/B testing capabilities for comparing different prompts or models.
Repeatability: Supports multiple iterations for each parameter combination and re-runs of past experiments.
Concurrency Management: Enables limited concurrency or synchronous inference calls to manage server resources.
Detailed Insights: Outputs inference parameters and includes metadata for response time and token analysis.
Experiment Management: Lists experiments in a downloadable JSON format and offers comprehensive views and re-run capabilities.

Grid Search Concept

While traditional grid search refers to optimizing training hyperparameters, the Ollama Grid Search applies a similar strategy for evaluation. Users can select models, prompts, and parameter combinations to generate and compare results.

Grid Search

A/B Testing

The app also provides a robust A/B testing functionality. Users can select different models to compare their outputs for a specific prompt or apply various prompts to see how each performs under similar configurations.

A/B Testing

Experiment Logs

For thorough analysis, users can access their experiment logs, which can be inspected or downloaded for further review.

Experiment Logs

Future Features

The development roadmap for the project includes:

Grading and filtering results by quality.
Storing experiments in a local database.
Importing, exporting, and sharing experiment parameters.

Contributing

The project welcomes contributions. For simple changes, such as bug fixes, users are encouraged to submit a pull request directly. For more substantial modifications or suggestions, opening an issue for discussion is recommended before proceeding with development.

Development Setup

Ensure Rust is installed.

Clone the repository:

git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search

Install frontend dependencies using your preferred package manager:
```
bun install
```
Configure rust-analyzer to use Clippy checks (especially in VS Code).
Start the app in development mode:
```
bun tauri dev
```

Citations

This repository and its contributions are recognized in academic works, such as theses from Santa Clara University about auto-tuning methods for machine learning hyperparameters.

Thank You!

The project extends gratitude to contributors such as @FabianLars, @pepperoni21, and @TomReidNZ for their support and contributions.