Skip to content

Comcast/python-batch-runner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Nathan Lee
Dec 12, 2020
ec918f2 · Dec 12, 2020

History

66 Commits
Jun 26, 2019
Feb 23, 2020
Dec 12, 2020
Mar 31, 2020
Dec 5, 2020
Nov 29, 2019
May 16, 2019
May 16, 2019
May 10, 2019
May 16, 2019
Mar 28, 2020
Feb 23, 2020
Jun 26, 2019
Jan 22, 2020

python-batch-runner

Documentation Status

For more complete documentation, please see: https://python-batch-runner.readthedocs.io/

python-batch-runner is a microframework to assist with building small to medium scale batch applications without needing to build the scaffolding from scratch.

It provides the following with zero or minimal setup required:

  • Basic project directory structure
  • Logging and automatic purging of old log files
  • Email notification upon job completion, with attached log files in case of failure
  • Configurability of task dependencies and parallel execution
  • Self-managed restartability from point of failure and state management
  • Data sharing across tasks

Installation

pip install python-batch-runner

Usage

New Project Setup

A simple setup function is included and can be run by executing:

pyrunner --setup

This will prompt you for three inputs:

  1. Project Name
    • Provide a name for your project/application without any spaces (i.e. MySampleProject). If spaces are included, they will be removed.
  2. Project Path
    • Provide the path to create the project directory in. The project name (lowercased) will be appended to this path.
    • If left blank, the path will default to the current working directory.
  3. Execution Mode
    • PyRunner will operate in either SHELL or PYTHON mode.
    • NOTE: SHELL-only mode will be deprecated in future versions.

Upon completion, a new directory at the provided (or default/current) path will be created, along with minimum necessary subdirectory and files.

Core Files

Upon above setup, the following three files will be generated:

  • .../config/app_profile
    • Contains a list of exports to setup the basic environment prior to execution. This file is sourced before anything else is executed by PyRunner.
  • .../config/<project_name>.lst
    • The process list file.
    • Contains header lines: Line 1 for execution mode (SHELL or PYTHON) and line 2 for column names (line 2 is only for user reference and may be deleted).
    • Any subsequent lines must be entered by the user. Each line must be a single task (pipe-separated) that describes the following: a. Task ID (must be unique and 1 or higher) b. Parent ID's (comma separated list of numbers that describe which tasks must successfully execute before this task will trigger) c. Maximum # of Retries (0 will mean the task will never run; 3 will mean the task will retry upon failure until executed a maximum of 3 times) d. Task Name e. Task Command (only in SHELL Mode - do not include for PYTHON mode) f. Task Module (only in PYTHON Mode - do not include for SHELL mode) g. Task Worker (only in PYTHON Mode - do not include for SHELL mode) h. Task Arguments (only in PYTHON Mode - do not include for SHELL mode) e. Absolute Path to Log File
  • .../main.sh
    • For convenience. Simply executes pyrunner with the minimally required options. Any arguments to this script will be passed through to the python script.
    • Not Required

Execution Modes

SHELL Mode

  • Will allow any shell command to be executed as a task/process.
  • This provides a free-form mode which can allow you to use scripts and executables from any language.
  • Executes each task as a new subprocess which inherits from the parent environment, but independent of other task environments.

PYTHON Mode

  • Restricts execution to only a single class from a user-defined module per task/process.
  • Executes each task as a new thread.
  • This has the benefit of allowing tasks to communicate via a common set of key/value pairs using a Context object. This effectively allows for the storing of state information during execution, and in case of a failure, this state will be preserved for job restarts.

Execution Options

Option Argument Description
--env [variable_name]=[variable_value] Set environment variable - equivalent to export [variable_name]=[variable_value]
--cvar [variable_name]=[variable_value] Set context variable to be available at the start of job.
-r Restart flag. Causes PyRunner to check the APP_TEMP_DIR for existing *.ctllog files to restart a job from failure. Fresh run if no *.ctllog file found.
-n or --max-procs integer Sets the absolute maximum number of parallel processes allowed to run concurrently.
-x or --exec-only comma separated list of process ID's Executes only the given process ID(s) from the .lst file.
--exec-proc-name single process name Similar to --exec-only - Executes only the process ID identified by the given process name.
-A or --to or --ancestors single process ID Executes given process ID and all preceding/ancestor processes.
-D or --from or --descendents single process ID Executes given process ID and all subsequent/descendent processes.
-N or --norun comma separated list of process ID's Prevents the given process ID(s) from executing.
-e or --email email address Sets email address to send job notification email after run completion. Overrides all other APP_EMAIL settings.
--es or --email-on-success true/false or 1/0 Enables or disables email notifications when job exits with success. Default is True.
--ef or --email-on-fail true/false or 1/0 Enables or disables email notifications when job exits with failure. Default is True.
-i or --interactive Primarily for use with -x option. Launches in interactive mode which will request input from user if a Context variable is not found.
-d or --debug Debug option that only serves to provide a more detailed output during execution to show names of pending, running, failed, etc. tasks.
--dump-logs Enables job to dump to STDOUT logs for all failed tasks after job exits.
--nozip Disables zipping of log files after job exits.
-t or --tickrate Sets the number of checks per second that the execution engine performs to poll running processes.
-h or --help Prints out options and other details.
-v or --version Prints out the installed PyRunner version.

Contribute

Please read the CONTRIBUTING file for more details.

License

python-batch-runner is released under the Apache 2.0 License. Please read the LICENSE file for more details.

About

A tiny framework for building batch applications as a collection of tasks in a workflow.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published