-
Notifications
You must be signed in to change notification settings - Fork 346
Closed
Labels
Description
I have this gitlab-ci.yml:
stages:
- test
- deploy
- train
sast:
stage: test
include:
- template: Security/SAST.gitlab-ci.yml
deploy_job:
stage: deploy
when: always
image: iterativeai/cml:0-dvc2-base1
script:
- cml-runner
--cloud aws
--cloud-region us-east-1
--cloud-type g3.4xlarge
--cloud-hdd-size 64
--cloud-aws-security-group="cml-runners-sg"
--labels=cml-runner-gpu
--idle-timeout=120
train_job:
stage: train
when: on_success
image: iterativeai/cml:0-dvc2-base1-gpu
tags:
- cml-runner-gpu
before_script:
- pip install poetry
- poetry --version
- poetry config virtualenvs.create false
- poetry install -vv
- nvdia-smi
script:
# DVC Stuff
- dvc pull
- dvc repro -m
- dvc push
# Report metrics
- echo "## Metrics" >> report.md
- echo "\`\`\`json" >> report.md
- cat metrics/best-meta.json >> report.md
- echo "\`\`\`" >> report.md
# Report GPU details
- echo "## GPU info" >> report.md
- cat gpu_info.txt >> report.md
# Send comment
- cml-send-comment report.md
But, the container can't recognize driver or GPU, on nvidia-smi command I had the following error:
/usr/bin/bash: line 133: nvdia-smi: command not found
I realized that iterativeai/cml:0-dvc2-base1-gpu can't use instance GPU. How could I install nvidia drivers and the nvidia-docker and activate
--gpus option on this docker?
Thank you
0x2b3bfa0 and gitdoluquita