Skip to content

PR #2596 introduced regression when configuring with -O3 optimization #2678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hppritcha opened this issue Jan 6, 2017 · 2 comments · Fixed by #2679
Closed

PR #2596 introduced regression when configuring with -O3 optimization #2678

hppritcha opened this issue Jan 6, 2017 · 2 comments · Fixed by #2679
Assignees
Milestone

Comments

@hppritcha
Copy link
Member

hppritcha commented Jan 6, 2017

#2596, which has been merged back in to v2.x and v.2.0.x introduced a major regression. Open MPI can no longer be compiled with gcc (at least) at -O2 or -O3 for x86_64 without immediately segfaulting in MPI_Init. The problem has been observed both for gcc 4.9 and 6.3.

orterun also fails with a similar segfault pattern. If one uses the cpuid instruction directly (avoiding all the exb exchange stuff - which isn't necessary for x86_64), the problem vanishes.

See comments to commit a718743.
Although the problem was originally reported on a system with Opteron processors,

@marksantcroos thanks for catching this.

@hppritcha hppritcha added this to the v2.0.2 milestone Jan 6, 2017
hppritcha referenced this issue Jan 6, 2017
Newer x86 processors have a core invariant tsc. On these systems it is
safe to use the rtdtsc instruction as a monotonic timer. This commit
adds a new function to the opal timer code to check if the timer
backend is monotonic. On x86 it checks the appropriate bit and on
other architectures it parrots back the OPAL_TIMER_MONOTONIC value.

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member

hjelmn commented Jan 7, 2017

Hmm, I tested this only with gcc on CTS-1. Optimized build (defaults to -O2). Will look at the backtrace and see what is going on.

hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2017
This commit fixes a bug in the timer check. When -fPIC is used we need
to save/restore ebx. The code copied from patcher was meant for 32-bit
systems and did not work correctly on 64-bit systems. This commit
updates the save/restore to use rbx instead of ebx.

Fixes open-mpi#2678

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member

hjelmn commented Jan 7, 2017

Rather embarrassing. The cpuid code I copied from patcher only saved/restored the lower 32 bits of register bx. PR open to fix.

hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2017
This commit fixes a bug in the timer check. When -fPIC is used we need
to save/restore ebx. The code copied from patcher was meant for 32-bit
systems and did not work correctly on 64-bit systems. This commit
updates the save/restore to use rbx instead of ebx.

Fixes open-mpi#2678

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2017
This commit fixes a bug in the timer check. When -fPIC is used we need
to save/restore ebx. The code copied from patcher was meant for 32-bit
systems and did not work correctly on 64-bit systems. This commit
updates the save/restore to use rbx instead of ebx.

Fixes open-mpi#2678

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 5b70ae3)
Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2017
This commit fixes a bug in the timer check. When -fPIC is used we need
to save/restore ebx. The code copied from patcher was meant for 32-bit
systems and did not work correctly on 64-bit systems. This commit
updates the save/restore to use rbx instead of ebx.

Fixes open-mpi#2678

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 5b70ae3)
Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Jan 7, 2017
This commit fixes a bug in the timer check. When -fPIC is used we need
to save/restore ebx. The code copied from patcher was meant for 32-bit
systems and did not work correctly on 64-bit systems. This commit
updates the save/restore to use rbx instead of ebx.

Fixes open-mpi#2678

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 5b70ae3)
Signed-off-by: Nathan Hjelm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants