Skip to content

Workflow persistence

Bartosz Balis edited this page Nov 26, 2020 · 2 revisions

Workflow execution state management

Persisting state of workflow execution

If you run a workflow with --persist flag, its execution state will be saved in a journal file, located in the workflow’s home directory, called <workflow_name>.<date>.log, where <workflow_name> is taken from the workflow.json file.

The log contains:

  • Full workflow directory path
  • hflow command line options used to run the workflow
  • Input signals passed to the workflow
  • Execution trace (firings of processes and output signals they produced)

Note that if a workflow function fails (never calls its callback), its entry will not be logged. You can use this behavior to recover workflow execution from a point it was interrupted, e.g. due to a failure.

Recovery

The following command runs a workflow recovering its state from the journal:

hflow recover <journal_file>

The workflow will be "replayed" from the recovery log and then will continue running from the point where the log has ended.

Such a recovered run is also persisted in a new journal file!

Invoking the function during recovery

By default, when a firing of a workflow activity is recovered, its workflow function is not invoked, simply the recovered outputs are reused. This behavior can be changed by setting flag executeWhenRecovering in the process’ metadata:

{
  "name": "Sum",
  "type": "dataflow",
  "function": "sum",
  "executeWhenRecovering": "true",   
  "ins": [ "square" ],
  "outs": [ "sum" ]
} 

If this flag is set, the function will always be invoked, even for recovered firings. Such an invocation differs from a normal one in that the function receives full instances of output signals produced in the previous run in the outs array. The function can tell if the invocation is "recovered" by examining the context.recovered flag.

The programmer can take advantage of this capability as follows:

function foo(ins, outs, context, cb) {
  if (context.recovered) {
      // recovery mode actions 
  } else {
     // normal execution
  }
}

For example, in the recovery mode the function may:

  • examine the outs array and decide if it is necessary to re-execute the job or simply invoke cb(null, outs) without doing anything.
  • re-compute internal state accumulated across multiple firings (stateful workflow activity)

Advanced: smart re-run

Sometimes you may want to partially replay and partially re-run a previous workflow run using a recovery log, e.g. because some software used in the workflow has been updated. HyperFlow provides a tool called hflow-recovery-analysis which annotates an existing recovery log with information on which tasks need re-execution based on a configuration file consisting of the following entries:

changed: {
  selector: <spec>
  value: <value>
}

Each entry specifies one or more workflow objects (processes or data) that have been changed. For example { "selector": "process.name", "value": "foo" } selects all processes whose name is foo. Consequently all tasks invoked by such processes and their successors will be re-executed.

Clone this wiki locally