Skip to content

Significant performance regression under Linux with busy waiting enabled #10929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hjelmn opened this issue Oct 14, 2022 · 0 comments · Fixed by #10930
Closed

Significant performance regression under Linux with busy waiting enabled #10929

hjelmn opened this issue Oct 14, 2022 · 0 comments · Fixed by #10930

Comments

@hjelmn
Copy link
Member

hjelmn commented Oct 14, 2022

Thank you for taking the time to submit an issue!

Background information

I am looking at a system which is showing very poor performance with some applications (notably OpenFOAM) when running Open MPI. The system in question has busy polling enabled to (in theory) improve message latency:

==> /proc/sys/net/core/busy_poll <==
50

==> /proc/sys/net/core/busy_read <==
50

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.1.x, main, etc

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Both from release tarball and git checkout.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Linux 3.10.0-1160.76.1.el7.x86_64
  • Computer hardware: Intel Xeon
  • Network type: 100 GigE

Details of the problem

With these settings we see a 3-5x slowdown in the performance of all steps of OpenFOAM motorbike (can give run details if requested) vs with them both set to 0 (busy waiting disabled).

Without busy polling:

Finished meshing in = 24.9 s
ExecutionTime = 41.7 s  ClockTime = 44 s

With busy polling:

Finished meshing in = 73.02 s.
ExecutionTime = 109.37 s  ClockTime = 112 s

The only difference between these runs is busy polling enabled vs disabled.

@hjelmn hjelmn self-assigned this Oct 14, 2022
hjelmn added a commit to hjelmn/ompi that referenced this issue Oct 14, 2022
Under normal circumstances epoll and poll produce similar performance on Linux.
When busy polling is enabled they do not. Testing with a TCP-based system shows
a significan performance degredation when using poll with busy waiting enabled.
This performance regression is not seen when using epoll. This PR adjusts the
default value of opal_event_include to epoll on Linux only to fix the
regression.

Fixes open-mpi#10929

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Oct 14, 2022
Under normal circumstances epoll and poll produce similar performance on Linux.
When busy polling is enabled they do not. Testing with a TCP-based system shows
a significan performance degredation when using poll with busy waiting enabled.
This performance regression is not seen when using epoll. This PR adjusts the
default value of opal_event_include to epoll on Linux only to fix the
regression.

Fixes open-mpi#10929

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Oct 17, 2022
Under normal circumstances epoll and poll produce similar performance on Linux.
When busy polling is enabled they do not. Testing with a TCP-based system shows
a significan performance degredation when using poll with busy waiting enabled.
This performance regression is not seen when using epoll. This PR adjusts the
default value of opal_event_include to epoll on Linux only to fix the
regression.

Fixes open-mpi#10929

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 279f6b6)
yli137 pushed a commit to yli137/ompi that referenced this issue Jan 10, 2024
Under normal circumstances epoll and poll produce similar performance on Linux.
When busy polling is enabled they do not. Testing with a TCP-based system shows
a significan performance degredation when using poll with busy waiting enabled.
This performance regression is not seen when using epoll. This PR adjusts the
default value of opal_event_include to epoll on Linux only to fix the
regression.

Fixes open-mpi#10929

Signed-off-by: Nathan Hjelm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant