STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
STaRK is an innovative project from Stanford University designed to benchmark large language models (LLMs) in retrieving information from textual and relational knowledge bases. Aimed at applications such as product searches, academic paper retrievals, and inquiries in biomedicine, STaRK sets a new standard in this domain by engaging with diverse, context-specific queries that mimic real-world scenarios.
Why STaRK?
Novel Task
STaRK tackles a unique challenge: how effectively LLMs can manage the intricate requirements posed by textual and relational data. This is especially important given the increasing complexity and variety of information retrieval needs in digital spaces.
Large-scale and Diverse Knowledge Bases
To support this challenge, STaRK includes three expansive knowledge bases sourced from publicly available data. These comprehensive datasets enable extensive testing across different domains and applications.
Natural and Practical Queries
The hallmark of the STaRK benchmark is its set of queries. These are crafted to reflect realistic questions users might have, incorporating complex relational and textual elements. This approach ensures that the system is tested under practical conditions.
Accessing STaRK
Getting started with STaRK is straightforward. It is available through a pip package (stark-qa
) and can be integrated into various environments supporting Python 3.8 through 3.11. The data is also accessible via the Hugging Face platform, which simplifies its integration into existing setups.
How to Use STaRK
-
Environment Setup: Install the STaRK package via pip or set it up from the source.
-
Data Loading and Integration: Utilize the
stark_qa
module to load datasets pertaining to specific subjects, like Amazon product data. This includes both the retrieval datasets and semi-structured knowledge bases. -
Benchmark Evaluation: Install additional packages such as
llm2vec
,gritlm
, andbm25
for evaluating retrieval tasks on the dataset. STaRK provides scripts for downloading and generating embeddings, facilitating immediate experimentation and evaluation.
Contributions to Research
STaRK is not only a tool for current applications but also a pathway to future advancements in information retrieval by providing researchers a robust dataset with which to innovate and assess new retrieval models. Its acceptance at the prestigious NeurIPS 2024 conference highlights its significance.
For Researchers
Researchers looking to cite STaRK in their work can reference the upcoming publication in NeurIPS Dataset & Benchmark Track 2024, which outlines the framework and findings related to STaRK's development and application.
@inproceedings{wu24stark,
title = {STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases},
author = {
Shirley Wu and Shiyu Zhao and
Michihiro Yasunaga and Kexin Huang and
Kaidi Cao and Qian Huang and
Vassilis N. Ioannidis and Karthik Subbian and
James Zou and Jure Leskovec
},
booktitle = {NeurIPS Datasets and Benchmarks Track},
year = {2024}
}
For more insights and detailed information, readers can explore STaRK's website, offering a comprehensive overview and access to various tools and resources related to the project.