Skip to content

V5.0.x bump submodules #12152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 11, 2023
Merged

Conversation

wenduwan
Copy link
Contributor

@wenduwan wenduwan commented Dec 8, 2023

Ingest prrte and pmix RC

@wenduwan wenduwan requested a review from janjust December 8, 2023 00:19
@github-actions github-actions bot added this to the v5.0.1 milestone Dec 8, 2023
@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

@janjust
Copy link
Contributor

janjust commented Dec 8, 2023

@wenduwan

Example failed: 124
[1450](https://jenkins.open-mpi.org/jenkins/job/open-mpi.pull_request-v2/job/PR-12152/1/pipeline-console/?start-byte=803767&selected-node=108#log-1450)
+ echo 'Command was: timeout -s SIGSEGV 4m mpirun --get-stack-traces --timeout 180 --hostfile /home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12152/hostfile -np 2 --bind-to none  ./examples/hello_c'
[1451](https://jenkins.open-mpi.org/jenkins/job/open-mpi.pull_request-v2/job/PR-12152/1/pipeline-console/?start-byte=803767&selected-node=108#log-1451)
Command was: timeout -s SIGSEGV 4m mpirun --get-stack-traces --timeout 180 --hostfile /home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12152/hostfile -np 2 --bind-to none  ./examples/hello_c
[1452](https://jenkins.open-mpi.org/jenkins/job/open-mpi.pull_request-v2/job/PR-12152/1/pipeline-console/?start-byte=803767&selected-node=108#log-1452)
+ exit 124
[1453](https://jenkins.open-mpi.org/jenkins/job/open-mpi.pull_request-v2/job/PR-12152/1/pipeline-console/?start-byte=803767&selected-node=108#log-1453)
script returned exit code 124

Looks like we have a failure with the bumped pointers?

@bosilca
Copy link
Member

bosilca commented Dec 8, 2023

We need to bump openpmix to ec54c21 to get the correct handling of -- in the arguments list (per openpmix/prrte#1886).

@rhc54
Copy link
Contributor

rhc54 commented Dec 8, 2023

My patch is actually on PRRTE

@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

@rhc54 Thanks for the quick fix! Is this related to openpmix/prrte#1864?

Sorry if I missed the test case!

Now that we have a fix, do you plan to do another RC?

@wenduwan wenduwan self-assigned this Dec 8, 2023
@rhc54
Copy link
Contributor

rhc54 commented Dec 8, 2023

Does anyone know why the CI actually failed? I don't see enough log output to know, and I certainly cannot reproduce a problem with running a simple "hello" program.

Is this related to openpmix/prrte#1864?

Sort of - but not really. The fix, at least, was unrelated to the fix for that specific issue.

Now that we have a fix, do you plan to do another RC?

I'd rather not do an RC for every individual problem/fix as that would become overly burdensome. I'd suggest just bumping to the head of the PMIx v4.2 and PRRTE v3.0 branches to pull in any fixes that relate to issues identified with OMPI's rc. We can then decide if another more formal round or rc's is needed.

@wenduwan wenduwan force-pushed the v5.0.x_bump_submodules branch from 084d7c3 to f1c35e8 Compare December 8, 2023 19:31
@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

Tracking prrte v3.0 branch in the latest revision. This will start the CI again. Will see will see.

@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

Looking at CI failure.

@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

The test appears flaky. I ran it twice, 1 passed 1 failed

(env) -bash-4.2$ ./run_example.sh
Hello, world, I am 0 of 2, (Open MPI v5.0.1a1, package: Open MPI [email protected] Distribution, ident: 5.0.1a1, repo rev: v5.0.0rc6-1154-gf1c35e8985, Unreleased developer copy, 183)
Hello, world, I am 1 of 2, (Open MPI v5.0.1a1, package: Open MPI [email protected] Distribution, ident: 5.0.1a1, repo rev: v5.0.0rc6-1154-gf1c35e8985, Unreleased developer copy, 183)
(env) -bash-4.2$ ./run_example.sh
+ export PATH=/home/ec2-user/ompi/install/bin:/home/ec2-user/env/bin:/opt/aws/neuron/bin:/opt/amazon/openmpi/bin/:/opt/amazon/efa/bin/:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin
+ PATH=/home/ec2-user/ompi/install/bin:/home/ec2-user/env/bin:/opt/aws/neuron/bin:/opt/amazon/openmpi/bin/:/opt/amazon/efa/bin/:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin
+ export LD_LIBRARY_PATH=/home/ec2-user/ompi/install/lib:
+ LD_LIBRARY_PATH=/home/ec2-user/ompi/install/lib:
+ timeout -s SIGSEGV 4m mpirun --get-stack-traces --timeout 180 --hostfile /home/ec2-user/PortaFiducia/hostfile -np 2 --bind-to none /home/ec2-user/ompi/examples/hello_c
(env) -bash-4.2$ echo $?
124

Checked v5.0.x branch. Pretty sure this is introduced with new prrte/pmix.

@rhc54
Copy link
Contributor

rhc54 commented Dec 8, 2023

Pretty sure this is introduced with new prrte/pmix.

Given the exact same behavior in #12156, which does not include a bump in submodule pointers, why would you conclude this?

@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

No. I commented in the wrong thread.

The bump introduced a hang. Git bisect shows it's coming from this commit openpmix/prrte@5c19e5e

@rhc54
Copy link
Contributor

rhc54 commented Dec 8, 2023

Okay, bump it up again to pickup the fix.

@wenduwan wenduwan force-pushed the v5.0.x_bump_submodules branch from f1c35e8 to 3ab7a80 Compare December 8, 2023 23:43
@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

@rhc54 Thank you!

I'm also testing it on my machine.

@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 8, 2023

Local test passed!

@wenduwan
Copy link
Contributor Author

wenduwan commented Dec 9, 2023

@janjust CI passed after bumping the pointers.

@janjust
Copy link
Contributor

janjust commented Dec 11, 2023

@wenduwan good to merge?

@wenduwan wenduwan merged commit fd66645 into open-mpi:v5.0.x Dec 11, 2023
@wenduwan wenduwan deleted the v5.0.x_bump_submodules branch December 11, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants