Skip to content

opal/event: use epoll by default on Linux #10930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Oct 14, 2022

Under normal circumstances epoll and poll produce similar performance on Linux. When busy polling is enabled they do not. Testing with a TCP-based system shows a significan performance degredation when using poll with busy waiting enabled. This performance regression is not seen when using epoll. This PR adjusts the default value of opal_event_include to epoll on Linux only to fix the regression.

Fixes #10929

Signed-off-by: Nathan Hjelm [email protected]

Under normal circumstances epoll and poll produce similar performance on Linux.
When busy polling is enabled they do not. Testing with a TCP-based system shows
a significan performance degredation when using poll with busy waiting enabled.
This performance regression is not seen when using epoll. This PR adjusts the
default value of opal_event_include to epoll on Linux only to fix the
regression.

Fixes open-mpi#10929

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn hjelmn force-pushed the use_epoll_on_linux_because_it_is_objectively_better branch from fdc46d2 to 279f6b6 Compare October 14, 2022 22:03
@hjelmn
Copy link
Member Author

hjelmn commented Oct 15, 2022

:bot:ibm:retest

@hjelmn hjelmn merged commit b79767e into open-mpi:main Oct 15, 2022
@jsquyres
Copy link
Member

@hjelmn Will you cherry-pick to v5.0.x?

@rhc54
Copy link
Contributor

rhc54 commented Oct 15, 2022

Just curious: we have multiple projects initializing the common event library. If they call that init function with different arguments, which one gets used? I suspect it is the first caller that sets things, but is that a tad fragile (e.g., if someone changes the code init sequence so PMIx goes first - which I think we were doing at one point)? Are we in trouble if the second (or later) library actually depends on a particular setting?

I'm wondering if we need to set some kind of global envar that the libraries can pick up to ensure they set things up as desired.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 17, 2022

@jsquyres Yes. The idea is to get this into v5.0.x.

@rhc54 Each project should get its own event base if they are not using opal directly. These event bases will use whatever was selected for their project. It might be worth looking into whether PMIx should use epoll as well in this situation. I don't have a system large enough to test how launch performance scales with busy polling enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Significant performance regression under Linux with busy waiting enabled
4 participants