Introduction to Sicarator
Overview
Sicarator is a command-line interface (CLI) generator designed specifically for data science projects. Developed using Yeoman, Sicarator offers a streamlined and efficient setup for creating data projects with a focus on quality and best practices. It is maintained by a company called Sicara, and it aims to simplify the process of initializing data science projects by providing essential tools and services right from the start.
Main Features
Sicarator automatically sets up a data science project with several state-of-the-art components that ensure a robust development environment. Here are the core features of a project generated by Sicarator:
-
Python Development Environment
- Poetry: Manages project dependencies efficiently.
- Pytest: Facilitates testing to ensure code quality.
- Ruff: Provides static analysis and code formatting to maintain clean code.
- Mypy: Offers type checking for Python.
- Pre-commit: Manages Git hooks for streamlined Git workflows.
-
Continuous Integration Options Users can select from a range of continuous integration tools, including:
- CircleCI
- Github Actions
- Gitlab CI/CD
- Azure Pipelines
-
Optional API setup For projects requiring an API, Sicarator can help set up:
- FastAPI: A fast, modern web framework for Python.
- Docker: For containerization.
- Deployment options on AWS, GCP, and more.
- Provisioning with Terraform and testing with Postman.
-
Additional Features (Optional)
- Data Versioning and Pipelines: Using DVC and Typer.
- Data Visualization: Through Streamlit.
- Experiment Tracking: Combining DVC with Streamlit.
Prerequisites
To get started with Sicarator, a few tools need to be installed first:
- Pyenv: For managing different Python versions and virtual environments.
- Poetry: To handle dependencies.
- Node.js: Necessary because Sicarator is built as a Node.js module using Yeoman.
- Yeoman: The framework upon which Sicarator is constructed.
Getting Started
To set up a new project with Sicarator, you only need to execute a few commands. After installing the necessary software versions, the Sicarator package itself can be installed globally, and projects can be generated with a simple yo sicarator
command.
Troubleshooting
Common issues may include permission errors during installation or Git credential issues. Solutions typically involve adjusting ownership settings for npm and node_modules or ensuring installations are performed with the current user profile to align current credentials.
Contributing
For those interested in contributing to Sicarator's development, one can clone the repository from GitHub, install the necessary dependencies, and link the project to test local changes before committing them to the shared codebase. This collaborative approach helps in enhancing the project and integrating new features delving into the latest advancements in data project setup automation.
Conclusion
Sicarator stands out by simplifying and speeding up the creation of data science environments with high-quality setup tools. Its extensive options and flexibility make it an invaluable resource for data scientists looking to optimize their development workflow.