Skip to content

Adapt subdir for CUDA toolkit in host injections #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

casparvl
Copy link
Contributor

@casparvl casparvl commented Aug 6, 2025

Try to change the subdir in which the CUDA toolkit is installed so that it also doesn't include the CPU microarchitecture

…at it also doesnt include the CPU microarchitecture
@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 6, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13622631

date job status comment
Aug 06 20:32:59 UTC 2025 submitted job id 13622631 will be eligible to start in about 20 seconds
Aug 06 20:33:08 UTC 2025 received job awaits launch by Slurm scheduler
Aug 06 20:33:21 UTC 2025 running job 13622631 is running
Aug 06 20:42:47 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13622631.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545126210.tar.gzsize: 2067 MiB (2167763827 bytes)
entries: 5559
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 06 20:42:47 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13622631.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

Hmmm, success, but not what I planned. Installdir for the install-cuda-and-libraries:

installpath               (E) = /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen4

I wanted it to be /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64.

I guess the sed command isn't correct:

sed: -e expression #1, char 20: unknown option to `s'

The odd thing is that this should have broken the sanity check for installing CUDA in the software-layer, because that should have created symlinks that point to this directory.

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 6, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13629738

date job status comment
Aug 06 20:56:43 UTC 2025 submitted job id 13629738 will be eligible to start in about 20 seconds
Aug 06 20:56:53 UTC 2025 received job awaits launch by Slurm scheduler
Aug 06 20:57:07 UTC 2025 running job 13629738 is running
Aug 06 21:06:41 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13629738.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545140440.tar.gzsize: 2067 MiB (2167757923 bytes)
entries: 5559
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 06 21:06:41 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13629738.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 6, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13632056

date job status comment
Aug 06 20:59:47 UTC 2025 submitted job id 13632056 will be eligible to start in about 20 seconds
Aug 06 21:00:01 UTC 2025 received job awaits launch by Slurm scheduler
Aug 06 21:00:16 UTC 2025 running job 13632056 is running
Aug 06 21:15:34 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13632056.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545145870.tar.gzsize: 2067 MiB (2167731041 bytes)
entries: 5559
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 06 21:15:34 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13632056.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

That's more like it!

installpath               (E) = /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64

Now I still need to carefully check the symlinks for the installations, to make sure they also refer here (because the old location also still contains CUDA, so it wouldn't lead to a broken install - making any mistakes harder to spot).

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

Yep, symlinks are still 'wrong', pointing to the old location:

lrwxrwxrwx   1 eessibot prjs1395  110 Aug  6 23:09 ptxas -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen4/software/CUDA/12.1.1/bin/ptxas

I'll check further tomorrow. The EB build log will probably show some output form the eb_hooks.

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 6, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13643484

date job status comment
Aug 06 21:17:44 UTC 2025 submitted job id 13643484 will be eligible to start in about 20 seconds
Aug 06 21:17:48 UTC 2025 received job awaits launch by Slurm scheduler
Aug 06 21:18:21 UTC 2025 running job 13643484 is running
Aug 06 21:27:44 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13643484.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545153110.tar.gzsize: 2067 MiB (2167725181 bytes)
entries: 5559
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 06 21:27:44 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13643484.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 6, 2025

That looks better:

 ls -al 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software/CUDA/12.1.1/bin/nvcc
lrwxrwxrwx 1 eessibot prjs1395 100 Aug  6 23:21 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software/CUDA/12.1.1/bin/nvcc -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/software/CUDA/12.1.1/bin/nvcc

…to e.g. /cvmfs/software.eessi.io/host_injections/x86_64, i.e. only include the CPU family in the prefix, not microarchitecture or accelerator architecture. Since these are binary installs, we don't need multiple copies, and requiring site admins to run the install scripts once per micro-architecture is just annoying (and requires more storage)
@casparvl
Copy link
Contributor Author

casparvl commented Aug 7, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 7, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13737255

date job status comment
Aug 07 11:18:09 UTC 2025 submitted job id 13737255 will be eligible to start in about 20 seconds
Aug 07 11:18:18 UTC 2025 received job awaits launch by Slurm scheduler
Aug 07 11:18:31 UTC 2025 running job 13737255 is running
Aug 07 11:27:56 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13737255.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545660100.tar.gzsize: 0 MiB (23442 bytes)
entries: 2
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 07 11:27:56 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13737255.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl casparvl changed the title Adapt subdir for CUDA toolkig in host injections Adapt subdir for CUDA toolkit in host injections Aug 7, 2025
…DNN package was found in the old host-injections location (with micro-arch specific subdir). Also, adapt the path to search for the regular LmodError
@casparvl
Copy link
Contributor Author

casparvl commented Aug 7, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 7, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13739465

date job status comment
Aug 07 11:55:25 UTC 2025 submitted job id 13739465 will be eligible to start in about 20 seconds
Aug 07 11:55:32 UTC 2025 received job awaits launch by Slurm scheduler
Aug 07 11:55:56 UTC 2025 running job 13739465 is running
Aug 07 12:00:33 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13739465.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545679620.tar.gzsize: 0 MiB (23442 bytes)
entries: 2
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 07 12:00:33 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13739465.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 7, 2025

== FAILED: Installation ended unsuccessfully (build directory: /tmp/tmp.4EJug6QIRZ/temp_install_storage/cuda_n_co.Ho7/build/CUDA/12.1.1/system-system): build failed (first 300 chars): Failed to create directory /cvmfs/software.eessi.io/host_injections/x86_64/software/CUDA/12.1.1: [Errno 13] Permission denied: '/cvmfs/software.eessi.io/host_injections/x86_64' (took 7 mins 30 secs)

Hmmm, that's strange. This directory is writeable:

$ ls -ald /path/to/bot/host-injections
drwxrwsr-x+ 5 ABC XYZ 4096 Aug  7 13:19 /path/to/bot/host-injections

@casparvl
Copy link
Contributor Author

casparvl commented Aug 7, 2025

Also:

grep: /cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/driver_version.txt: Permission denied
ESC[33mThe host GPU driver libraries version have changed. Now its: (v575.57.08)ESC[0m
ESC[33mCleaning out outdated symlinks.ESC[0m
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/cuda_version.txt': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/driver_version.txt': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/host': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/latest': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libEGL.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libEGL.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libEGL_nvidia.so.0': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGL.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGL.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLESv1_CM.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLESv1_CM.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLESv1_CM_nvidia.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLESv2.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLESv2.so.2': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLESv2_nvidia.so.2': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLX.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLX.so.0': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLX_nvidia.so.0': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLdispatch.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libGLdispatch.so.0': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libOpenCL.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libOpenGL.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libOpenGL.so.0': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libcuda.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libcuda.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libcudadebugger.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvcuvid.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvcuvid.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-cfg.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-cfg.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-egl-wayland.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-eglcore.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-encode.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-encode.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-fbc.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-fbc.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-glcore.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-glsi.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-glvkspirv.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-gpucomp.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-gtk3.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-ml.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-ml.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-nvvm.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-nvvm.so.4': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-opencl.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-opticalflow.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-ptxjitcompiler.so': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-ptxjitcompiler.so.1': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-rtcore.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvidia-tls.so.555.42.06': Permission denied
rm: cannot remove '/cvmfs/software.eessi.io/host_injections/nvidia/x86_64/host/libnvoptix.so.1': Permission denied

That's really strange, it looks like the issue I had before when the bind-mounting became the default, except: the repo is really fuse-mounted here:

add fusemount options for CVMFS repo 'eessi.io-2023.06-software'
Using a fuse mount for /cvmfs/eessi.io-2023.06-software
...
singularity  run --nv --contain --fusemount container:cvmfs2 software.eessi.io /cvmfs_ro/software.eessi.io --fusemount container:unionfs -o cow /tmp/software.eessi.io/overlay-upper=RW:/cvmfs_ro/software.eessi.io=RO /cvmfs/software.eessi.io /tmp/eessibot/EESSI/eessi_job.mp0KqBw1YK/eessi
.iTYZSjkO6k/ghcr.io_eessi_build_node_debian12.sif /gpfs/work1/1/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/event_31042a10-7380-11f0-8928-4b97b9f16a29/run_000/linux_x86_64_amd_zen4/eessi.io-2023.06-software/install_software_layer.sh --build-logs-dir /projects/eessibot/eessi-bot-surf/bui
ldlogs --shared-fs-path /projects/eessibot/eessi-bot-surf/SHARED
...

@casparvl
Copy link
Contributor Author

casparvl commented Aug 7, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Aug 7, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.08/pr_59/13743306

date job status comment
Aug 07 12:25:45 UTC 2025 submitted job id 13743306 will be eligible to start in about 20 seconds
Aug 07 12:25:50 UTC 2025 received job awaits launch by Slurm scheduler
Aug 07 12:26:13 UTC 2025 running job 13743306 is running
Aug 07 12:30:50 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13743306.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17545697770.tar.gzsize: 0 MiB (23441 bytes)
entries: 2
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Aug 07 12:30:50 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13743306.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 7, 2025

Hm, issue might have been two bot jobs trying at the same time. I cleaned out the host_injections/x86_64 dir for the bot, so that we can start fresh.

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf architecture:x86_64/amd/zen4 accelerator:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-generic and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81538

date job status comment
Aug 08 10:27:27 UTC 2025 submitted job id 81538 awaits release by job manager
Aug 08 10:28:42 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 10:47:59 UTC 2025 running job 81538 is running
Aug 08 12:42:01 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81538.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-17546529480.tar.gzsize: 4946 MiB (5186843143 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/generic/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/generic/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/generic/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/generic/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/generic/.lmod/SitePackage.lua
Aug 08 12:42:01 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81538.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_n1 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81539

date job status comment
Aug 08 10:27:31 UTC 2025 submitted job id 81539 awaits release by job manager
Aug 08 10:28:55 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 12:48:14 UTC 2025 running job 81539 is running
Aug 08 14:06:08 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81539.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-17546590730.tar.gzsize: 4946 MiB (5186872188 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/neoverse_n1/.lmod/SitePackage.lua
Aug 08 14:06:08 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81539.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_n1 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81540

date job status comment
Aug 08 10:27:34 UTC 2025 submitted job id 81540 awaits release by job manager
Aug 08 10:28:52 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 14:05:40 UTC 2025 running job 81540 is running
Aug 08 15:45:45 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81540.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-17546638920.tar.gzsize: 4946 MiB (5186814886 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/neoverse_n1/.lmod/SitePackage.lua
Aug 08 15:45:45 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81540.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_n1 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81541

date job status comment
Aug 08 10:27:38 UTC 2025 submitted job id 81541 awaits release by job manager
Aug 08 10:28:50 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:41:20 UTC 2025 running job 81541 is running
Aug 08 15:26:42 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81541.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-17546622100.tar.gzsize: 4946 MiB (5186879197 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/neoverse_n1/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/neoverse_n1/.lmod/SitePackage.lua
Aug 08 15:26:42 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81541.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_v1 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81542

date job status comment
Aug 08 10:27:41 UTC 2025 submitted job id 81542 awaits release by job manager
Aug 08 10:29:03 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:24:15 UTC 2025 running job 81542 is running
Aug 08 14:34:27 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81542.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-17546609650.tar.gzsize: 4946 MiB (5186840553 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/neoverse_v1/.lmod/SitePackage.lua
Aug 08 14:34:29 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81542.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_v1 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81543

date job status comment
Aug 08 10:27:45 UTC 2025 submitted job id 81543 awaits release by job manager
Aug 08 10:29:00 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:16:58 UTC 2025 running job 81543 is running
Aug 08 14:24:05 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81543.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-17546605400.tar.gzsize: 4946 MiB (5186838950 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/neoverse_v1/.lmod/SitePackage.lua
Aug 08 14:24:06 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81543.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture aarch64-neoverse_v1 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81544

date job status comment
Aug 08 10:27:48 UTC 2025 submitted job id 81544 awaits release by job manager
Aug 08 10:28:57 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 12:55:42 UTC 2025 running job 81544 is running
Aug 08 14:01:32 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81544.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-17546592440.tar.gzsize: 4946 MiB (5186818346 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/neoverse_v1/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/neoverse_v1/.lmod/SitePackage.lua
Aug 08 14:01:32 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81544.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 8, 2025

Stupid, if multiple installations update things in host-injections, all-but-one will fail. I should have just started one build per architecture for each bot instance, and only when all host-injections were done, start the rest.

Also, these builds take forever. Note sure if this is related to the slowness that @ocaisa experienced, but it's... bad.

Edit: might be due to gzip being slow, we should really look into deploying zst on the build clusters...

Edit2: for 81528, total build time for all CUDA and cuDNNs was around 50 minutes or so. By now, the job is running for almost 2 hours. I bet the rest is in creating tarballs.

@casparvl
Copy link
Contributor Author

casparvl commented Aug 8, 2025

Relaunching what has failed so far due to builds encountering a lock-file in host-injections, but no complete CUDA install yet. The rest of the builds should complete succesfully, since they were started after another build had already completed its install in host-injections.

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc architecture:aarch64/nvidia/grace accelerator:nvidia/cc80
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen2 accelerator:nvidia/cc70
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen2 accelerator:nvidia/cc90
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen3 accelerator:nvidia/cc70
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen3 accelerator:nvidia/cc80
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen3 accelerator:nvidia/cc90
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen4 accelerator:nvidia/cc70
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:zen4 accelerator:nvidia/cc80
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:haswell accelerator:nvidia/cc70
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:haswell accelerator:nvidia/cc80
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:haswell accelerator:nvidia/cc90
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:icelake accelerator:nvidia/cc90
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:cascadelake accelerator:nvidia/cc70
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:cascadelake accelerator:nvidia/cc90
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accelerator:nvidia/cc70
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accelerator:nvidia/cc80

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Aug 8, 2025

New job on instance eessi-bot-jsc for CPU micro-architecture aarch64-nvidia-grace and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/eessibot/jobs/2025.08/pr_59/13981573

date job status comment
Aug 08 13:46:14 UTC 2025 submitted job id 13981573 awaits release by job manager
Aug 08 13:46:26 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:47:30 UTC 2025 running job 13981573 is running
Aug 08 14:44:12 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13981573.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-17546625750.tar.gzsize: 4946 MiB (5186913376 bytes)
entries: 8203
modules under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/aarch64/nvidia/grace/.lmod/SitePackage.lua
Aug 08 14:44:12 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-13981573.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81546

date job status comment
Aug 08 13:46:15 UTC 2025 submitted job id 81546 awaits release by job manager
Aug 08 13:46:18 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:50:58 UTC 2025 running job 81546 is running
Aug 08 15:28:04 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81546.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-17546630010.tar.gzsize: 5072 MiB (5318455822 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen2/.lmod/SitePackage.lua
Aug 08 15:28:04 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81546.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81547

date job status comment
Aug 08 13:46:19 UTC 2025 submitted job id 81547 awaits release by job manager
Aug 08 13:47:36 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:52:18 UTC 2025 running job 81547 is running
Aug 08 15:28:06 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81547.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-17546631450.tar.gzsize: 5072 MiB (5318433262 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen2/.lmod/SitePackage.lua
Aug 08 15:28:06 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81547.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81548

date job status comment
Aug 08 13:46:22 UTC 2025 submitted job id 81548 awaits release by job manager
Aug 08 13:47:45 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:53:48 UTC 2025 running job 81548 is running
Aug 08 15:26:49 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81548.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17546629150.tar.gzsize: 5072 MiB (5318458392 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua
Aug 08 15:26:49 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81548.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81549

date job status comment
Aug 08 13:46:26 UTC 2025 submitted job id 81549 awaits release by job manager
Aug 08 13:47:41 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:53:45 UTC 2025 running job 81549 is running
Aug 08 15:28:07 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81549.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17546629090.tar.gzsize: 5072 MiB (5318429598 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua
Aug 08 15:28:07 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81549.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81550

date job status comment
Aug 08 13:46:30 UTC 2025 submitted job id 81550 awaits release by job manager
Aug 08 13:47:38 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:52:21 UTC 2025 running job 81550 is running
Aug 08 15:26:45 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81550.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17546627740.tar.gzsize: 5072 MiB (5318420970 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua
Aug 08 15:26:47 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81550.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81551

date job status comment
Aug 08 13:46:34 UTC 2025 submitted job id 81551 awaits release by job manager
Aug 08 13:47:51 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:56:51 UTC 2025 running job 81551 is running
Aug 08 15:26:52 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81551.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17546628300.tar.gzsize: 5072 MiB (5318405745 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen4/.lmod/SitePackage.lua
Aug 08 15:26:52 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81551.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81552

date job status comment
Aug 08 13:46:37 UTC 2025 submitted job id 81552 awaits release by job manager
Aug 08 13:47:48 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:55:20 UTC 2025 running job 81552 is running
Aug 08 15:26:50 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81552.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17546628040.tar.gzsize: 5072 MiB (5318397677 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/amd/zen4/.lmod/SitePackage.lua
Aug 08 15:26:50 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81552.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-haswell and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81553

date job status comment
Aug 08 13:46:41 UTC 2025 submitted job id 81553 awaits release by job manager
Aug 08 13:48:13 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 16:32:09 UTC 2025 running job 81553 is running
Aug 08 17:48:58 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81553.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-haswell-17546725090.tar.gzsize: 5072 MiB (5318425053 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/intel/haswell/.lmod/SitePackage.lua
Aug 08 17:48:58 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81553.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-haswell and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81554

date job status comment
Aug 08 13:46:45 UTC 2025 submitted job id 81554 awaits release by job manager
Aug 08 13:48:09 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 15:56:07 UTC 2025 running job 81554 is running
Aug 08 17:15:47 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81554.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-haswell-17546704190.tar.gzsize: 5072 MiB (5318441817 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/intel/haswell/.lmod/SitePackage.lua
Aug 08 17:15:47 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81554.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-haswell and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81555

date job status comment
Aug 08 13:46:48 UTC 2025 submitted job id 81555 awaits release by job manager
Aug 08 13:48:07 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 14:55:34 UTC 2025 running job 81555 is running
Aug 08 16:38:06 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81555.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-haswell-17546684160.tar.gzsize: 5072 MiB (5318427709 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc90/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc90/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/haswell/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/intel/haswell/.lmod/SitePackage.lua
Aug 08 16:38:06 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81555.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-icelake and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81556

date job status comment
Aug 08 13:46:52 UTC 2025 submitted job id 81556 awaits release by job manager
Aug 08 13:48:16 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 14:55:37 UTC 2025 running job 81556 is running
Aug 08 15:00:37 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-81556.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-icelake-17546650210.tar.gzsize: 0 MiB (23522 bytes)
entries: 3
modules under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
Aug 08 15:00:38 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81556.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-cascadelake and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81557

date job status comment
Aug 08 13:46:56 UTC 2025 submitted job id 81557 awaits release by job manager
Aug 08 13:48:04 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 15:56:01 UTC 2025 running job 81557 is running
Aug 08 17:15:45 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81557.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-cascadelake-17546704690.tar.gzsize: 5072 MiB (5318454536 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/intel/cascadelake/.lmod/SitePackage.lua
Aug 08 17:15:45 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81557.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-cascadelake and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81558

date job status comment
Aug 08 13:46:59 UTC 2025 submitted job id 81558 awaits release by job manager
Aug 08 13:48:01 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 13:49:45 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-81558.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-cascadelake-17546609040.tar.gzsize: 0 MiB (23519 bytes)
entries: 3
modules under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
Aug 08 13:49:45 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81558.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81559

date job status comment
Aug 08 13:47:03 UTC 2025 submitted job id 81559 awaits release by job manager
Aug 08 13:47:57 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 16:48:11 UTC 2025 running job 81559 is running
Aug 08 18:02:27 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81559.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17546734850.tar.gzsize: 5072 MiB (5318426980 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc70/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc70
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/generic/.lmod/SitePackage.lua
Aug 08 18:02:28 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81559.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81560

date job status comment
Aug 08 13:47:07 UTC 2025 submitted job id 81560 awaits release by job manager
Aug 08 13:47:54 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 15:27:59 UTC 2025 running job 81560 is running
Aug 08 16:43:49 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-81560.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17546687880.tar.gzsize: 5072 MiB (5318424939 bytes)
entries: 11913
modules under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
cuDNN/8.9.2.26-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
cuDNN/8.9.2.26-CUDA-12.1.1
reprod directories under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2023.06/software/linux/x86_64/generic/.lmod/SitePackage.lua
Aug 08 16:43:49 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81560.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 8, 2025

Strange, on 82558 I get:

ESC[31mERROR: EESSI module should've set EESSI_ACCELERATOR_TARGET () when EESSI_ACCELERATOR_TARGET_OVERRIDE (accel/nvidia/cc90) exported.ESC[0m

But it's also strange that the bot did not report the start of the job. It made me think it maybe an issue with some update of the EESSI module not being deployed for this architecture, but #59 (comment) completed successfully, so that can't be the case. I'll retry once more, it may just be some strange hickup by the bot (also considering the fact that it didn't report the start of the build.

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:cascadelake accelerator:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-cascadelake and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81561

date job status comment
Aug 08 14:59:45 UTC 2025 submitted job id 81561 awaits release by job manager
Aug 08 15:00:04 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 17:05:27 UTC 2025 running job 81561 is running
Aug 08 17:06:41 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-81561.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-cascadelake-17546727030.tar.gzsize: 0 MiB (23521 bytes)
entries: 3
modules under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
Aug 08 17:06:41 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81561.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 8, 2025

Strange, 81556 has the same issue:

ESC[31mERROR: EESSI module should've set EESSI_ACCELERATOR_TARGET () when EESSI_ACCELERATOR_TARGET_OVERRIDE (accel/nvidia/cc90) exported.ESC[0m

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:icelake accelerator:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Aug 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-icelake and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.08/pr_59/81563

date job status comment
Aug 08 15:46:17 UTC 2025 submitted job id 81563 awaits release by job manager
Aug 08 15:46:49 UTC 2025 released job awaits launch by Slurm scheduler
Aug 08 15:48:06 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-81563.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-icelake-17546680290.tar.gzsize: 0 MiB (23520 bytes)
entries: 3
modules under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
2023.06/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
Aug 08 15:48:06 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-81563.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Contributor Author

casparvl commented Aug 8, 2025

Same failure for 81563 as before:

ESC[31mERROR: EESSI module should've set EESSI_ACCELERATOR_TARGET () when EESSI_ACCELERATOR_TARGET_OVERRIDE (accel/nvidia/cc90) exported.ESC[0m

I'm really not sure what's causing this, as it succeeds for the same CPU arch + different accelerator arch, so it almost can't be a problem with an EESSI-extend that's out of sync (which would be my first suspicion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant