redun - Python-based Workflow Framework with Lazy Expression Evaluation

Introduction to the Redun Workflow Engine

Redun is a powerful and efficient workflow framework built on the popular Python programming language. It takes a unique approach to workflow management, allowing users to define workflows as lazy expressions, which are dynamically evaluated to form directed acyclic graphs (DAGs). This methodology overcomes the limitations of traditional workflow tools by supporting control flow, composability, and recursion, among other high-level language features.

Key Features of Redun

Lazy Expressions: Workflows in Redun are defined using lazy expressions that generate dynamic DAGs. This allows for the creation of complex data flows without the constraints of predefined dataflows.
Incremental Computation: Redun is reactive to changes in both data and code. It detects modifications in data values and external data sources, as well as alterations in function code, ensuring workflows are recalculated only when necessary.
Diverse Compute Backends: Workflow tasks in Redun can be executed on various compute backends, including threads, processes, AWS batch jobs, and Spark jobs.
Data and Code Change Detection: Redun uses file hashing and function hashing to efficiently detect changes in data and code, ensuring accurate and up-to-date workflow execution.
Centralized Caching: Intermediate results from past executions are cached centrally, optimizing resource usage by reusing these results across multiple workflows.
Data Lineage and Provenance: Previous call graphs are stored as data lineage records, allowing users to search and debug past workflows.

Installation and Use Cases

To install Redun, you can simply run the following command:

pip install redun

For additional features such as using a Postgres backend or generating visualization files, you can extend the installation with:

pip install redun[postgres]
pip install redun[viz]

Redun is versatile and suitable for a wide range of use cases, including bioinformatics, cheminformatics, web or API data extraction, general data science, and more.

A Glimpse into Redun in Action

Consider a simple example of using Redun for compiling a C program. The workflow involves tasks that compile each C file into an object file, followed by linking these object files into a final executable program. This example demonstrates how Redun's lazy evaluations form a dataflow DAG and execute tasks in parallel threads.

Here's a snippet of Redun code defining such a workflow:

# make.py

import os
from typing import Dict, List
from redun import task, File

redun_namespace = "redun.examples.compile"

@task()
def compile(c_file: File) -> File:
    os.system(f"gcc -c {c_file.path}")
    return File(c_file.path.replace(".c", ".o"))

@task()
def link(prog_path: str, o_files: List[File]) -> File:
    o_files = " ".join(o_file.path for o_file in o_files)
    os.system(f"gcc -o {prog_path} {o_files}")
    return File(prog_path)

@task()
def make_prog(prog_path: str, c_files: List[File]) -> File:
    o_files = [compile(c_file) for c_file in c_files]
    return link(prog_path, o_files)

In this script, the tasks are defined using the @task decorator, which is Redun’s way of marking a function as a task within a workflow.

Exploring Provenance and Mixed Computing Backends

Redun records every workflow execution into a database, providing a detailed history that users can explore for debugging and analysis. This feature is especially useful for understanding complex workflows and reproducing or extending past work.

Additionally, Redun supports mixed computing backends. Users can configure tasks to run on different platforms like separate processes, AWS Batch jobs, or Spark jobs with minimal configuration. Redun is designed to handle the movement of data and code as well as the scheduling on various backends automatically.

Redun's design and functionality make it a robust choice for managing complex workflows in diverse fields. Its ability to leverage Python's expressive capabilities while efficiently executing distributed workflows sets it apart as an advanced workflow engine.