-
Notifications
You must be signed in to change notification settings - Fork 10
Create utils function for nvidia-smi check #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: laraPPr <[email protected]>
A sync with |
I think you can have a couple of tests for this, there's mocking for |
Signed-off-by: laraPPr <[email protected]>
…ayer-scripts into utils_function_nvidia
BUILD_STEP_ARGS+=("--nvidia" "all") | ||
elif [ ${ec} -eq 1 ]; then | ||
BUILD_STEP_ARGS+=("--nvidia" "install") | ||
elif [ ${ec} -eq 2 ]; then | ||
BUILD_STEP_ARGS+=("--nvidia" "install") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now exactly mimicking the behavior but I'm not sure this is correct.
This is now set in this case No 'nvidia-smi' found, no available GPU but allowing overriding this check
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed now always set. I do not think this is what we want, right. exerpt of logs from EESSI/software-layer#1143
No 'nvidia-smi' found, no available GPU but allowing overriding this check
Executing command to build software:
/project/60006/SHARED/jobs/2025.08/pr_1107/event_cfecebd0-6eb2-11f0-9bd7-28ae4dcb0ae2/run_000/linux_x86_64_intel_sapphirerapids/eessi.io-2023.06-software/software-layer-scripts/eessi_container.sh
--verbose --access rw --mode run --container docker://ghcr.io/eessi/build-node:debian12
--repository eessi.io-2023.06-software --extra-bind-paths /project/60006/SHARED/jobs/2025.08/pr_1107/event_cfecebd0-6eb2-11f0-9bd7-28ae4dcb0ae2/run_000/linux_x86_64_intel_sapphirerapids/eessi.io-2023.06-software/software-layer-scripts,/dev
--pass-through --contain --save /project/60006/SHARED/jobs/2025.08/pr_1107/event_cfecebd0-6eb2-11f0-9bd7-28ae4dcb0ae2/run_000/linux_x86_64_intel_sapphirerapids/eessi.io-2023.06-software/previous_tmp/build_step
--storage /tmp/bot/EESSI/eessi_job.N27hQTIAbE --nvidia install
--host-injections /project/def-users/bot/shared/host-injections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our scripts this is fine, it is a once only thing, once things are installed they are not reinstalled.
Co-authored-by: ocaisa <[email protected]>
Signed-off-by: laraPPr <[email protected]>
Signed-off-by: laraPPr <[email protected]>
Signed-off-by: laraPPr <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, a CI check would be great. You can do it in https://github.com/EESSI/software-layer-scripts/blob/main/.github/workflows/tests_link_nvidia_host_libraries.yml, check once before nvidia-smi
is mocked that it returns non-zero, and once after it has been mocked
…ies.yml Signed-off-by: laraPPr <[email protected]>
Implemented the CI see https://github.com/EESSI/software-layer-scripts/actions/runs/16718197861/job/47316223515?pr=22 and I think it does what is expected? |
This looks good, can you sync it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @laraPPr
No description provided.