-
Notifications
You must be signed in to change notification settings - Fork 43
Also run the CI tests with shared windows disabled #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree, that this should definitely be tested. However this (again) rises the time the CI takes to finish. Implemented in branch feat-184-ci-sw. |
It's fixed in branch See branch |
@rkowalewski @fmoessbauer @fuerlinger In PR #189, I re-enabled MPI shared windows in CI builds as there apparently is a bug in OpenMPI 2, if I'm not mistaken. There is a deadlock where one unit is entering a barrier, and another unit calls This should not block, if I understand the MPI standard correctly. For the full log, see CircleCI build no. 646: Note that this failed for OpenMPI 2 only. This defect has could not be provoked by NastyMPI, another hint that it's actually a bug in OpenMPI 2. |
@fuchsto Actually you have been a bit faster than me. You are totally right, OpenMPI-2 blocks in one-sided communications. However this issue happens only if we use multiple MPI windows which is actually the case in DART. The issue has been recently discovered and is documented in Issue #2530 in the Open-MPI Repository. Additionally, we have another bug in |
Okay, so long story short: We won't use OpenMPI 2 in CI for now. This is just one of several arcane defects in OpenMPI 2 I stumbled upon and fixing them doesn't help DASH/DART at all. |
We should revert to OpenMPI 1.10.x |
Yes and No - I recommend the following:
This is also beneficial, because if we only use only two Circle containers, we can run two commits at a time 👍 |
Yes, that's my plan. Once they got a grip on RDMA again, we will use it. ... emm, as you already mentioned. |
Fixed in #189 |
Once DART successfully builds again with disabled shared windows, we should run CI tests with both configurations: enabled and disabled.
Disabled shared windows more closely resembles the distributed memory setting and might uncover issues that would otherwise go unnoticed. Plus it is needed for NastyMPI to work (intercept puts and gets).
The text was updated successfully, but these errors were encountered: