Introduction to Prodigy Recipes
The Prodigy Recipes project is a repository hosted on GitHub by the team at Explosion.ai. This collection offers a set of customizable scripts (recipes) intended for use with Prodigy, a tool designed for annotating text, images, and other types of data. To utilize these recipes, users must have a valid license for Prodigy, which can be purchased on their official website.
Purpose and Usage
Prodigy is a scriptable tool which means users can fine-tune it according to their needs. The Prodigy Recipes repository enhances this flexibility by providing a base set of scripts that users can directly employ or modify to develop custom annotation workflows.
To run Prodigy, users generally execute the command from their terminal, and custom scripts can be passed using specific arguments. For instance, using the -F
argument allows the user to refer to a custom recipe file. A typical command might look like this:
python -m prodigy ner.teach your_dataset en_core_web_sm ./data.jsonl --label PERSON -F prodigy-recipes/ner/ner_teach.py
Users are encouraged to experiment with the code, such as swapping out functions, applying different sorting methods, or introducing new filters to tailor the annotation process.
Key Features—Recipes Overview
The repository includes a variety of recipe scripts categorized primarily into Named Entity Recognition, Text Classification, Terminology, Image Annotation, and other unique tasks.
Named Entity Recognition (NER)
NER recipes are designed to help in training models to identify entities within text. Some highlighted recipes include:
ner.teach
: Encourages interactive model training with user feedback.ner.match
: Uses pattern files to suggest phrases and allows marking relevant entities.ner.manual
: Allows manual annotation without needing a machine learning model in the loop.
Text Classification
These recipes assist with classifying text into categories:
textcat.manual
: Direct manual annotation of text categories.textcat.teach
: Focused on collecting high-quality training data dynamically with user feedback.
Image Annotation
These focus on drawing annotations on images:
image.manual
: Facilitates manual drawing of bounding boxes on images for model training.
Other Categories
The repository includes other versatile scripts like choice-based data annotation and question-answering pairs. They provide various preprocessing and post-processing capabilities.
Community Contributions and Tutorials
The Prodigy Recipes repository also hosts community-contributed scripts and tutorials, expanding its range of functionalities. Community recipes provide additional tools, and tutorials often demonstrate practical applications with these scripts.
Examples and Patterns
To support users further, the repository contains example datasets and pattern files. These resources offer a practical starting point for beginners to understand and utilize Prodigy Recipes effectively.
Conclusion
Prodigy Recipes is an invaluable toolkit for those looking to enhance their Prodigy experience, offering out-of-the-box solutions and the freedom to develop custom annotation workflows effortlessly. It's a community-driven environment supported by Explosion.ai, aiming to empower users in their data annotation tasks across a wide range of applications.