Remote Job Submission #190
Replies: 1 comment
-
Service-focused implementationWe discussed an approach where the PSIJ implementation is wrapped into a service instance and remote submission is achieved by accessing that service over some protocol. The PSIJ implementation would in this case not need any changes (the actual submission remains local from its perspective). We would need to tackle the following issues:
choice of protocolREST has been mentioned as a potential and obvious choice. It is easy to document, tooling exists in all languages, etc. Data staging is easy to integrate. But REST makes it cumbersome to provide notifications on state changes, and we need to be careful about performance on large job numbers. ZMQ is another option as it is used in several of our stack already, is rather resilient against network drops and fast. Notifications are easy to support. But a ZMQ protocol would be more complex to document and to debug, and it limits the range of available tools. service deploymentService instances can run in system or in user space. In the first case (system space), the site administrators are responsible for providing open ports, to configure authentication, etc. In this case the service implementation needs to be multi-tenant which implies code which needs to live on top of the current PSIJ implementation. It also adds the challenge to convince site administrators to deploy the service. The second case (use space) simplifies the (single-tenant) implementation, but requires the deploying user (possibly a domain scientist) to ensure that an open port exists and that the service endpoint is suitably communicated to the remote clients. It also likely requires to tunnel communication to be tunneled over ssh as user-space services can usually not hook into the system's AAA infrastructure for access control. Note that HPC sites are usually firewalled so that open ports are not always available to end users. A single code could potentially support both user and system space deployments, but would have to pay the penalty for both (additional code for multi-tenancy, support for protocol tunneling and port management). |
Beta Was this translation helpful? Give feedback.
-
Now that local job submission begins to being used, we should begin to plan for remote job submission also. We can use this thread to discuss possible approaches.
Beta Was this translation helpful? Give feedback.
All reactions