Project Templates: Simplifying Your Workflow
Weasel, formerly known as spaCy projects, is a powerful tool designed to manage and share complete workflows for various use cases and domains. Whether you're looking to train custom NLP pipelines, prepare them for production, or collaboratively share them with your team, Weasel makes the process streamlined and efficient. Here's a comprehensive overview of what it offers.
Key Features
- Project Templates: Start with predefined templates catering to different needs like NLP pipeline training, specific use case tutorials, third-party integrations, benchmarks, and experimental workflows.
- End-to-End Workflow Management: Design, train, package, and deploy your NLP models seamlessly.
- Sharing and Collaboration: Easily export your trained models as Python packages and upload outputs for team access.
Categories of Templates
- Pipelines: These templates focus on building NLP pipelines with various components, suitable for different datasets and applications.
- Tutorials: Walkthrough templates guide users through specific NLP use cases, providing a comprehensive learning experience from start to finish.
- Integrations: Templates demonstrate how to integrate spaCy with third-party tools, enhancing data management, model iteration, and deployment capabilities.
- Benchmarks: Offer templates designed to replicate benchmarks, allowing for easy comparison between different systems or spaCy versions.
- Experimental: These templates feature cutting-edge workflows and experimental setups for those ready to explore the latest developments, albeit at their own risk.
Getting Started
Starting with Weasel is simple. Users can access the functionality either via the Weasel command-line interface (CLI) or the spacy project
command. To get started, commands can be explored with a --help
flag for assistance.
- Clone a project template that fits your task:
python -m weasel clone tutorials/ner_fashion_brands
- Install any necessary project requirements:
cd ner_fashion_brands python -m pip install -r requirements.txt
- Fetch assets like data and weights as specified:
python -m weasel assets
- Execute a command from the
project.yml
file:python -m weasel run preprocess
- Run a full workflow in sequence:
python -m weasel run all
- Customize the template to suit your use case, import your own data, tweak settings, and models, then share with your team.
Repository Maintenance
To ensure that the project templates are up to date, a series of maintenance scripts are included:
- update_docs.py: Refreshes all auto-generated documentation, ensuring that only the auto-generated sections are updated.
- update_category_docs.py: Updates the
README.md
files in category directories to reflect available project templates. - update_configs.py: Keeps the configuration files current with spaCy changes by auto-filling necessary fields.
- update_projects_jsonl.py: Updates the
projects.jsonl
file, which contains vital project information.
Weasel is included by default with spaCy version 3.7 and later, and can be easily installed via pip or conda for previous versions. Whether you are a beginner or an experienced developer, Weasel provides the tools to effectively manage NLP projects from creation to deployment.