loft
Discover LOFT, a benchmark to assess long-context language models in retrieval, reasoning, and additional tasks. With 35 datasets across different modalities, the benchmark evaluates capabilities in retrieval, RAG, SQL, and multi-hop reasoning. Resources include datasets, installation instructions, and evaluation scripts available from a central repository. Gain insights into each dataset, recognize task types, and use scripts for inference and assessment with VertexAI's gemini-1.5-flash-002 model. Understand how these models advance retrieval and reasoning approaches.