-
Notifications
You must be signed in to change notification settings - Fork 64
{2023.06}[foss/2023b] Flux 0.76.0 #1107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
Updates by the bot instance
|
New job on instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
Updates by the bot instance
|
New job on instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
Updates by the bot instance
|
New job on instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
Updates by the bot instance
|
New job on instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
Updates by the bot instance
|
New job on instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
Updates by the bot instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/sapphirerapids Let's try the problematic cases first. |
New job on instance
|
New job on instance
|
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 |
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
New job on instance
|
Groan, there is something seriously wrong with srapids build, it is taking ~3.5hr for most Intel builds, Arm are taking ~1.25hr, even a64fx takes ~6hr, srapids is taking ~10hrs (having previously taken the same ~3hr in #1107 (comment)). Looking into the logs, it is taking 20 seconds or more to create a file (second case is where the failure occurred but the first case was very close to failing):
I can possibly tweak the timeout via a hook, but it is all very frustrating...and incredibly slow to diagnose since I cannot reproduce these problems interactively. |
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/sapphirerapids |
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/sapphirerapids In EESSI/software-layer-scripts#50 we pass through cvmfs if it is available on the host, let's see if that helps. |
New job on instance
|
Any idea where it's trying to create these files (which path)? There's a lot of moving parts here (Slurm job environment, our build container, unionfs, CernVM-FS, ...), but the fact that you're unable to reproduce this interactively does seem to trim things down a bit? I'm all for simply skipping the test step when on building flux on Sapphire Rapids for now, because it seems like even when being able to reproduce this, it would be painful to figure out the culprit (let alone fixing it). Are these any other signs of failures? Do we have a complete log for a failing attempt somewhere, to do some digging? |
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/sapphirerapids |
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/sapphirerapids |
New job on instance
|
No description provided.