Skip to content

What is Dewret and Why?

Executive Summary

Dewret (DEclarative Workflow REndering Tool) is a Python framework that bridges the gap between dynamic, code-based workflows and static, portable workflow specifications. It enables organizations to write complex computational workflows in familiar Python syntax while automatically generating standard workflow formats (CWL) for execution across diverse computing environments.

The Problem

Organizations running computational workloads face a fundamental trade-off:

  • Dynamic workflows (Python/Dask) are flexible and developer-friendly but lack portability, reproducibility, and optimization opportunities
  • Static workflows (CWL/Snakemake/etc.) are portable and optimizable but require verbose YAML/DSL syntax with steep learning curves

This forces teams to choose between developer productivity and operational requirements, often resulting in:

  • Vendor lock-in to specific execution platforms
  • Difficulty sharing workflows across teams and institutions
  • Inability to leverage workflow-specific optimizations
  • Challenges with reproducibility and version control

The Dewret Solution

Dewret eliminates this trade-off by providing a Python-first authoring experience that automatically renders to portable, standards-compliant workflows.

Core Value Proposition

# Write natural Python code
@task()
def process_data(file: str) -> pd.DataFrame:
    return pd.read_csv(file)

@workflow()
def analysis_pipeline(data_file: str):
    df = process_data(data_file)
    return analyze(df)

Renders automatically to

# Industry-standard CWL workflow
class: Workflow
cwlVersion: v1.2
steps:
  process_data:
    run: process_data.cwl
    in: {file: data_file}
# ... complete executable workflow

Key Benefits

For Development Teams

  • Familiar Syntax: Write workflows in Python using existing skills and tools
  • Rich Type System: Leverage Python's type hints for automatic workflow validation
  • Debugging Support: Test and debug workflows locally before deployment
  • Code Reuse: Integrate with existing Python libraries and frameworks
  • IDE Support: Type-hinting and eager execution simplify step debugging in IDEs

For Operations Teams

  • Multi-Platform Execution: Deploy to multiple environment types
  • Performance Optimization: Enable workflow engines to optimize execution plans
  • Container Support: Automatic integration with containerized environments
  • Resource Management: Leverage workflow engine capabilities for scheduling and scaling

For Organizations

  • Vendor Independence: Avoid lock-in to specific workflow platforms
  • Standards Compliance: Generate workflows compatible with chosen industry standards
  • Cost Optimization: Test locally, optimize workflows before expensive HPC runs
  • Audit Trail: Git-versionable static workflows enhance reproducibility and issue bisection

Technical Architecture

Lazy Evaluation Engine

Built on Dask's proven lazy evaluation system, Dewret constructs workflow graphs without executing code, enabling analysis and optimization before rendering.

Multi-Renderer Support

  • CWL: Common Workflow Language for maximum portability
  • Extensible: Plugin architecture for custom renderers

Enterprise Features

  • Early Error Detection: Catch workflow issues before resource-intensive execution
  • Static Analysis: Enable workflow linting, security scanning, and optimization
  • CI/CD Integration: Automated workflow testing and validation
  • Documentation Generation: Automatic workflow documentation from code

Use Cases

Scientific Computing

  • Multi-step data analysis pipelines
  • High-performance computing workloads
  • Cross-institutional collaboration

Data Engineering

  • ETL/ELT pipeline development
  • Multi-cloud data processing
  • Batch processing workflows

Machine Learning

  • Model training pipelines
  • Feature engineering workflows
  • MLOps automation

Advanced Usage

Proof-of-concept work that can be extended to full features:

  • Graph-based Checkpointing: By knowing output goals and any memoization achieved in previous execution, with a defined "audit log" schema, dewret will eliminate redundant computation consistently, before reaching the executor, even for complex DAGs
  • Multi-engine Workflows: From a single workflow in dewret, coarse- and fine-grained execution can be achieved by making the "split-points" between workflows on (e.g.) Argo and workflows on (e.g.) dask workers choosable at render-time.

Why Dewret Now?

  • Growing adoption of workflow standards (CWL adoption up 300% in life sciences)
  • Increased focus on reproducible research and FAIR data principles
  • Multi-cloud strategies requiring portable workloads

Competitive Advantage

  • Unique Position: Only tool providing Python-native authoring with multi-format rendering
  • Proven Foundation: Built on mature technologies (Dask, established workflow standards)
  • Active Ecosystem: Integration with existing Python scientific computing stack

Return on Investment Drivers

  • Reduced Development Time: Far faster workflow development vs. native YAML
  • Improved Reliability: Early error detection prevents costly HPC failures
  • Enhanced Portability: Single codebase runs across multiple execution environments
  • Team Efficiency: Leverage existing Python expertise instead of learning new DSLs

Getting Started

Dewret is available as an open-source Python package with comprehensive documentation, examples, and community support. Organizations can evaluate Dewret with existing workflows and see immediate benefits in developer productivity and workflow portability.

Installation: pip install dewret
Documentation: https://flaxandteal.github.io/dewret
Repository: https://github.com/flaxandteal/dewret