You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For whatever it may be worth, I recently ran some HAN benchmarks, and saw great improvement in large-message Allreduce latency by increasing the segment size (MCA coll_han_allreduce_segsize), from the default 64K to 1M. (with the non-simple implementation)
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.1.1
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from source
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I am working to tune Open MPI on a new system type. By default coll/tuned is being selected and is giving so-so performance:
The large messages look ok but small messages are not great.
When forcing coll/han things look way better for small messages at a huge cost to the large message performance:
Is this expected? Another MPI on the system is getting 74us for the small messages (below 1k) and 1400us for 1MB messages.
The text was updated successfully, but these errors were encountered: