lmppl - Enhance Your Text Analysis Using Language Model Perplexity Tools

Introduction to the LM-PPL Project

LM-PPL, also known as Language Model Perplexity, is a Python library designed to calculate the perplexity of text using various pre-trained language models (LMs). Perplexity is a measure of how predictable a text is by a language model, and a lower perplexity indicates that the text is more fluent or prototypical. This library is particularly useful for evaluating the quality of text generated by language models.

What LM-PPL Does

The LM-PPL library allows users to compute perplexity for different types of language models:

Recurrent LMs: For models like GPT-3, LM-PPL computes ordinary perplexity.
Encoder-Decoder LMs: For models like BART and T5, it calculates the perplexity of the decoder.
Masked LMs: For models like BERT, LM-PPL computes a special kind of perplexity known as pseudo-perplexity.

Getting Started

To start using the LM-PPL library, you can easily install it via pip:

pip install lmppl

Examples of Usage

Solving Sentiment Analysis

Here's how LM-PPL can be used to solve sentiment analysis using perplexity. The goal is to select the text with the lower perplexity as the correct model prediction.

Recurrent LM Example (using GPT-2):

import lmppl

scorer = lmppl.LM('gpt2')
text = [
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am happy.',
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am sad.'
]
ppl = scorer.get_perplexity(text)
print(list(zip(text, ppl)))
print(f"prediction: {text[ppl.index(min(ppl))]}")

Masked LM Example (using DeBERTa):

import lmppl

scorer = lmppl.MaskedLM('microsoft/deberta-v3-small')
text = [
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am happy.',
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am sad.'
]
ppl = scorer.get_perplexity(text)
print(list(zip(text, ppl)))
print(f"prediction: {text[ppl.index(min(ppl))]}")

Encoder-Decoder LM Example (using T5):

import lmppl

scorer = lmppl.EncoderDecoderLM('google/flan-t5-small')
inputs = [
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
]
outputs = [
    'I am happy.',
    'I am sad.'
]
ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
print(list(zip(outputs, ppl)))
print(f"prediction: {outputs[ppl.index(min(ppl))]}")

Supported Models

The LM-PPL library supports a variety of popular models. Here are some examples:

Model	HuggingFace ID	Model Type
BERT	google-bert/bert-base-uncased	MaskedLM
Roberta	roberta-large	MaskedLM
GPT 2	gpt2-xl	LM
flan-ul2	google/flan-ul2	EncoderDecoderLM
GPT-NeoX	EleutherAI/gpt-neox-20b	LM
OPT	facebook/opt-30b	LM
Mixtral	mistralai/Mixtral-8x22B-v0.1	LM
Llama 3	meta-llama/Meta-Llama-3-8B	LM

Tips for Using LM-PPL

Max Token Length: Each language model has its own maximum token length, and specifying shorter lengths might speed up processing but could impact accuracy. Testing different lengths can help determine the optimal setting for your texts.
Batch Size: You can define the batch size to manage memory usage using get_perplexity (e.g., get_perplexity(text, batch_size=32)). Larger texts may cause memory issues, so adjusting the batch size can help.

With LM-PPL, users have a powerful tool to evaluate and compare the fluency of text generated by different language models, enhancing their capabilities in natural language processing tasks.