Optimize busy loop for modern linux by default #2041
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use just clock wait in place of sched_yield that is heavy in modern kernel side, also when called from light virtualisations like LXC or chroots.
I managed to make 8000 sched_yields per second without pti and 5000 with it, so this is already better for cases where zero threading threshold is in force, and with no kernel bombing also for turbo CPUs.
It dropped about 5% of time spent in gemm on huge set, I did not test much more.
Explanation from different point in file.
This does not solve problem of busy loop being employed, where some light IPC could work, it just eases life of current code.