Skip to content

Can't use AWS Instance GPU on GITLAB CI and CML-RUNNER #848

@leoitcode

Description

@leoitcode

I have this gitlab-ci.yml:

stages:
  - test
  - deploy
  - train

sast:
  stage: test
include:
- template: Security/SAST.gitlab-ci.yml

deploy_job:
  stage: deploy
  when: always
  image: iterativeai/cml:0-dvc2-base1
  script:
    - cml-runner
      --cloud aws
      --cloud-region us-east-1
      --cloud-type g3.4xlarge
      --cloud-hdd-size 64
      --cloud-aws-security-group="cml-runners-sg"
      --labels=cml-runner-gpu
      --idle-timeout=120
train_job:
  stage: train
  when: on_success
  image: iterativeai/cml:0-dvc2-base1-gpu
  tags:
    - cml-runner-gpu
  before_script:
    - pip install poetry
    - poetry --version
    - poetry config virtualenvs.create false
    - poetry install -vv
    - nvdia-smi
  script:
    # DVC Stuff
    - dvc pull
    - dvc repro -m
    - dvc push
    # Report metrics
    - echo "## Metrics" >> report.md
    - echo "\`\`\`json" >> report.md
    - cat metrics/best-meta.json >> report.md
    - echo "\`\`\`" >> report.md
    # Report GPU details
    - echo "## GPU info" >> report.md
    - cat gpu_info.txt >> report.md
    # Send comment
    - cml-send-comment report.md

But, the container can't recognize driver or GPU, on nvidia-smi command I had the following error:

/usr/bin/bash: line 133: nvdia-smi: command not found

I realized that iterativeai/cml:0-dvc2-base1-gpu can't use instance GPU. How could I install nvidia drivers and the nvidia-docker and activate
--gpus option on this docker?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions