openlogprobs - Reveal Log-Probabilities in Language Model APIs with Python

Introduction to Openlogprobs

Overview

Openlogprobs is an innovative Python API designed to extract log-probabilities from language model APIs. Log-probabilities are a measure of how likely a model thinks a particular word or phrase is to appear in a sentence. By making it accessible, openlogprobs opens up new possibilities for researchers and developers working with language models.

Why Openlogprobs?

Most language model APIs do not openly provide log-probabilities for various reasons, including security and data management concerns. Directly sharing these probabilities can lead to unintended information leaks. Additionally, transmitting such a large amount of data (due to the extensive vocabulary) is not efficient for most APIs. Despite these limitations, many APIs allow the use of a 'logit bias,' which can be employed to manipulate the likelihood of particular tokens subtly. Openlogprobs leverages this feature to reconstruct entire probability vectors, essentially reverse-engineering the hidden information.

Usage

Openlogprobs offers several methods for extracting log-probabilities, depending on the capabilities of the API being used:

Top-k Search

For APIs that reveal the top-k log-probabilities, openlogprobs can quickly extract the probabilities for the next token:

from openlogprobs import extract_logprobs
extract_logprobs("gpt-3.5-turbo-instruct", "i like pie", method="topk")

Exact Solution

This method is efficient when the top-k log-probabilities are available, requiring fewer API calls by fetching k tokens at a time:

from openlogprobs import extract_logprobs
extract_logprobs("gpt-3.5-turbo-instruct", "i like pie", method="exact", parallel=True)

Binary Search

Even when top-k log-probabilities aren't exposed by an API, openlogprobs can still extract the full distribution using more language model calls:

from openlogprobs import extract_logprobs
extract_logprobs("gpt-3.5-turbo-instruct", "i like pie", method="bisection")

Future Enhancements

The developers are seeking contributions to expand openlogprobs' functionality, including:

Supporting multiple logprobs through concurrent binary searches
Estimating costs for different APIs
Implementing checkpointing to save progress

Algorithms

Openlogprobs utilizes advanced algorithms such as binary search and top-k methods. The binary search applies varying logit biases to gauge token probabilities, progressively approximating the likelihood of every token relative to the most likely one. For a precise extraction of log-probabilities, the exact solution algorithm is employed.

Context and Development

The openlogprobs algorithm is primarily developed by Justin Chiu to aid in the research paper Language Model Inversion. Those using openlogprobs for academic purposes are encouraged to cite this work.

Contributions to the exact solution algorithm were made by Matthew Finlayson. If you're interested in the mathematical underpinnings, further information is provided here.

By opening up access to log-probabilities, openlogprobs empowers users to perform in-depth analyses and enhances the transparency and usability of language models in various research and development scenarios.