Spot instances- Runner must be able to restart workflow

Ideally a third job could help, a workflow for GH and GL would be:

```yaml
stages:
  - ml
  - check

train:
  stage: ml
  tags:
    - gpu
   - check

  cache:
    paths:
    - ./models
    
  script:
    -  echo "setup a pipeline here"

check:
  stage: check
  when: on_failure
  needs:
    - train

  script:
    - echo "Restarting..."
```
```yaml
name: cml

on: [push]

jobs:
  train:
    # needs: deploy
    runs-on: [self-hosted,gpu]

    steps:
      - uses: actions/checkout@v2

      - name: Cache multiple paths
        uses: actions/cache@v2
        with:
          path: |
            ./models
          key: models

      - name: cml_run
        shell: bash
        env:
          repo_token: ${{ secrets.GITHUB_TOKEN }} 
        run: |
          echo "setup a pipeline here"

  check:
    if: failure()
    needs: train
    runs-on: [ubuntu-latest]
    steps:
      - name: cml_check
        run: |
          echo "Restarting...."
```

however this approach has has two issues:
 - While in GH the lost of the runner can be recovered ending with a failed job in GL the job without a valid runner can run forever. I opened a ticket [here](https://gitlab.com/gitlab-org/gitlab/-/issues/229851)
 - The biggest drawback would be restarting the workflow in a loop. Having the runner the ability to listen for the spot instance eviction will be a better warranty of acting properly

This implies that we have to provide the cleanup scripts when deploying the spot instances, this scrips just only need to run the runner cleanup and restart. of the workflow.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spot instances- Runner must be able to restart workflow #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spot instances- Runner must be able to restart workflow #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions