

  • Towards an Extensible Workflow Engine by Modifying Its Internal State Transition Model




In large-scale data analysis such as genome analysis, researchers construct workflows (also known as pipelines), consisting of multiple data analysis applications. In recent years, workflows have been described in DSLs called workflow description languages instead of build systems such as Make or programming languages such as Python. Workflow description languages enable us to focus on the data analysis application for each step and the dependencies between steps. A workflow engine is a system to execute workflows and can encapsulate where a given workflow is executed (e.g., cloud computing resources and computing nodes in job schedulers), and how a given workflow is executed (e.g., re-execution of failed steps). However, researchers may not execute their workflows efficiently enough on their platforms because a given workflow engine does not support them. Although it is technically possible to fix the workflow engine to support them, it is hard to estimate where to fix and how long does it take. To solve this problem, this presentation describes our ongoing work of ep3, a workflow engine based on a state transition model. It represents a given workflow as Petri nets, and executes them by executing shell commands for state transitions. It enables users to modify the network structure and shell commands to be executed via external settings and thus we expect that it enables us to extend its functionality without modifying the workflow engine itself.


詳細情報 詳細情報について

