Introduction to the LM-PPL Project
LM-PPL, also known as Language Model Perplexity, is a Python library designed to calculate the perplexity of text using various pre-trained language models (LMs). Perplexity is a measure of how predictable a text is by a language model, and a lower perplexity indicates that the text is more fluent or prototypical. This library is particularly useful for evaluating the quality of text generated by language models.
What LM-PPL Does
The LM-PPL library allows users to compute perplexity for different types of language models:
- Recurrent LMs: For models like GPT-3, LM-PPL computes ordinary perplexity.
- Encoder-Decoder LMs: For models like BART and T5, it calculates the perplexity of the decoder.
- Masked LMs: For models like BERT, LM-PPL computes a special kind of perplexity known as pseudo-perplexity.
Getting Started
To start using the LM-PPL library, you can easily install it via pip:
pip install lmppl
Examples of Usage
Solving Sentiment Analysis
Here's how LM-PPL can be used to solve sentiment analysis using perplexity. The goal is to select the text with the lower perplexity as the correct model prediction.
-
Recurrent LM Example (using GPT-2):
import lmppl scorer = lmppl.LM('gpt2') text = [ 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am happy.', 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am sad.' ] ppl = scorer.get_perplexity(text) print(list(zip(text, ppl))) print(f"prediction: {text[ppl.index(min(ppl))]}")
-
Masked LM Example (using DeBERTa):
import lmppl scorer = lmppl.MaskedLM('microsoft/deberta-v3-small') text = [ 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am happy.', 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am sad.' ] ppl = scorer.get_perplexity(text) print(list(zip(text, ppl))) print(f"prediction: {text[ppl.index(min(ppl))]}")
-
Encoder-Decoder LM Example (using T5):
import lmppl scorer = lmppl.EncoderDecoderLM('google/flan-t5-small') inputs = [ 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.', 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.' ] outputs = [ 'I am happy.', 'I am sad.' ] ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs) print(list(zip(outputs, ppl))) print(f"prediction: {outputs[ppl.index(min(ppl))]}")
Supported Models
The LM-PPL library supports a variety of popular models. Here are some examples:
Model | HuggingFace ID | Model Type |
---|---|---|
BERT | google-bert/bert-base-uncased | MaskedLM |
Roberta | roberta-large | MaskedLM |
GPT 2 | gpt2-xl | LM |
flan-ul2 | google/flan-ul2 | EncoderDecoderLM |
GPT-NeoX | EleutherAI/gpt-neox-20b | LM |
OPT | facebook/opt-30b | LM |
Mixtral | mistralai/Mixtral-8x22B-v0.1 | LM |
Llama 3 | meta-llama/Meta-Llama-3-8B | LM |
Tips for Using LM-PPL
-
Max Token Length: Each language model has its own maximum token length, and specifying shorter lengths might speed up processing but could impact accuracy. Testing different lengths can help determine the optimal setting for your texts.
-
Batch Size: You can define the batch size to manage memory usage using
get_perplexity
(e.g.,get_perplexity(text, batch_size=32)
). Larger texts may cause memory issues, so adjusting the batch size can help.
With LM-PPL, users have a powerful tool to evaluate and compare the fluency of text generated by different language models, enhancing their capabilities in natural language processing tasks.