PromptSource: A Comprehensive Introduction
Overview
PromptSource is a highly specialized toolkit designed for creating, sharing, and utilizing natural language prompts. This toolkit plays a significant role in the realm of large language models, which have recently demonstrated an impressive capacity for zero-shot generalization to new tasks. Such capabilities were notably illustrated by models like GPT-3, FLAN, and T0, where prompts have been crucial in achieving even stronger performance. PromptSource addresses the growing demand among natural language processing (NLP) researchers and engineers for tools that facilitate prompt creation and application.
What Are Prompts?
At its core, a prompt is a function that translates a dataset example into a natural language input paired with a target output. PromptSource houses a vast and expanding collection of these prompts, known as the P3 (Public Pool of Prompts). As of early 2022, P3 includes approximately 2,000 English prompts covering over 170 datasets, offering a robust set of resources for NLP tasks.
Using PromptSource
Installation
For individuals interested solely in utilizing existing prompts, simply install PromptSource via pip:
pip install promptsource
For users wishing to create new prompts, downloading and installing the repository locally is required, with specific steps provided for setup.
Applying Prompts
The toolkit allows users to apply prompts to samples from datasets found in the Hugging Face Datasets library. Here's a basic example using the ag_news
dataset:
# Import necessary libraries
from datasets import load_dataset
from promptsource.templates import DatasetTemplates
# Load a dataset example
dataset = load_dataset("ag_news", split="train")
example = dataset[1]
# Access prompts for the dataset
ag_news_prompts = DatasetTemplates('ag_news')
# Choose and apply a prompt
prompt = ag_news_prompts["classify_question_first"]
result = prompt.apply(example)
print("INPUT: ", result[0])
print("TARGET: ", result[1])
This script demonstrates how to load a dataset, retrieve available prompts, select a desired prompt, and apply it to a dataset instance.
Accessing Prompts for Subsets
For accessing prompts tailored to specific subsets of datasets, you would adjust the dataset loading syntax accordingly:
dataset_name, subset_name = "super_glue", "rte"
dataset = load_dataset(f"{dataset_name}/{subset_name}", split="train")
example = dataset[0]
prompts = DatasetTemplates(f"{dataset_name}/{subset_name}")
Creating New Prompts
PromptSource enables prompt creation through a web-based graphical user interface (GUI), offering three modes:
- Sourcing: Craft and develop new prompts.
- Prompted Dataset Viewer: Review all prompts, both newly created and existing ones.
- Helicopter View: Analyze high-level metrics and the status of P3.
Starting the application involves running a specific command after setup completion to launch the interface locally.
Writing Prompts
Developers interested in contributing new prompts can refer to the contribution guidelines detailed within the repository. These guidelines provide explicit instructions to ensure new contributions are consistent with the existing collection.
Project Origin
PromptSource and its component P3 originated from the BigScience project, a collaborative study of large models and datasets conducted outside major tech company environments. With involvement from hundreds of researchers globally, PromptSource played a foundational role in the paper "Multitask Prompted Training Enables Zero-Shot Task Generalization."
Known Issues and Recognition
While PromptSource is a powerful toolkit, users might encounter some known issues such as compatibility problems on specific operating systems or error messages while running applications. Recommended fixes for these issues are provided within the project documentation.
For those employing P3 or PromptSource in their research or applications, proper citation of the accompanying reference document is encouraged to acknowledge the contributions of the developers and the research community behind it.
Overall, PromptSource serves as a pivotal tool in advancing the development and utility of natural language prompts, supporting a broad range of applications in NLP.