-
Notifications
You must be signed in to change notification settings - Fork 900
Topic/osc pt2pt fixes #2581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Topic/osc pt2pt fixes #2581
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Fixes Issue open-mpi#2505
* `ompi_osc_pt2pt_sync_pscw_peer` needs to be protected when called, and the `sync` structure needs to be protected in the `start` function. - This is to prevent the `ompi_osc_pt2pt_sync_pscw_peer` from processing while `start` is also in progress on the same window in two different threads - namely the 'progress' thread and the 'main' thread. - This seems to happen when the 'main' thread is part way through the `start` function then the progress thread starts processing the `post` message received from another peer for this window. Both functions try to access the `peer_list` portion of the structure and a NULL is stepped on in the `ompi_osc_pt2pt_sync_pscw_peer` function. - This patch locks the `sync` structure for the duration of the `start` and ensures that whenever `ompi_osc_pt2pt_sync_pscw_peer` is called the caller is holding the `sync->lock`. This provides exclusivity to this structure ensuring that the latter function sees a fully updated structure.
…ing. * Otherwise one thread could be in `opal_list_append` while the second is in `opal_list_remove_first` in `ompi_osc_pt2pt_progress_pending_acc` - The former was holding the `module` lock, while the latter was holding the `accumulate` lock so neither was properly protected. - This resulted in a segv when there was concurrent access.
* From Mark - see comment in code
* Mark's patch * This condition was introduced in: - open-mpi@9444df1 - Would like to review with Nathan on this patch. * put is waiting for `eager_send_active` to become `true`, but (for some reason) when incoming post()s or ACKs of lock requests trigger this function it declines to transition `eager_send_active` to `true` in the case of Win_lock_all.
… spin instead of wait for signal
* It is possible for us to get to this selection statement with `tmp_sync` set to `REQUEST_COMPLETED` which causes a segv when we try to dereference that value inside `wait_sync_update`. So check against both `REQUEST_PENDING` and `REQUEST_COMPLETED` before attempting that function.
d47dd92
to
b2e80a9
Compare
- From Mark - This replaces the previous commit, making it more focused: - "osc/pt2pt: Fix put hang when used with win_lock_all"
- Change lor to land in conditional - Make sure to grab the lock before waiting in the conditoin
awlauria
added a commit
to awlauria/ompi
that referenced
this pull request
Jun 21, 2022
OpenPMIx changes: 6c9d3dd - Fix pcompress/zlib implementation (open-mpi#2625) ac0654a - Fix typos. b5ab80c - gds/shmem: Support multiple local clients. 43d3345 - Checkpoint gds/shmem work. 3f7b1d0 - Add PMIx Standard version info to pmix_info fa15e0d - Update the PMIx ABI Query attributes to match the standard dac9573 - Checkpoint gds/shmem work. 92e20e5 - Fix typo in setting hwloc verbosity dad0fb8 - Some initial valgrind cleanup c00f97f - TMA: Integrate more custom memory allocator infrastructure. 1286709 - Require flex only when keyval_lex.c is not provided af5dd49 - Optimize the file descriptor cleanup on OSX 137cc3d - Support colocation of processes 49a28d3 - Initialize pmix_info_t flags when loading e11fe2a - Save a legacy URI to support older tools 2f13feb - Deduplicate key/value storage for qualified values. 62d675f - Revert "Rename mca.h to pmix_mca.h to avoid include-path confusion" 288eed9 - Add a few missing renames 4d2e2b5 - Rename mca.h to pmix_mca.h to avoid include-path confusion 7c72657 - pmix_fd: cap the max FD to try to close 594303d - configure.ac: update directory space check 5ccd945 - Checkpoint more gds/shmem work. 34a26e8 - configury: do look for sed 4782e9a - Complete support for group-level info b54565f - Address pedantic compiler warnings (open-mpi#2581) 90c06ff - Checkpoint gds/shmem work. (open-mpi#2579) 3766868 - Silence pedantic compiler warnings. (open-mpi#2580) 958d6bb - Modify the key-index support ec77b2b - Correctly handle qualified values b2f14cf - Remove GDS envar disablement 4e664bd - Checkpoint data storage and retrieval work in gds/shmem. PRRTe changes: df7d17d0a3 - Support colocation of spawned procs 6a77daafb6 - Fix a few bugs/leaks in the OOB subsystem 551c927741 - Remove stale Java support 48e44f4203 - Avoid PMIx server release of HWLOC topology 9fe286019e - Use fileno_unlocked if available 802d4d4349 - Incremental valgrind improvements 3083bdbec4 - Revert "Optimize the file descriptor cleanup on OSX" 5ee9b5abba - prrte: fix core dump while printing stack-trace 87df79b900 - Optimize the file descriptor cleanup on OSX f09d98a4f7 - odls: fix alps compilation problem db194bb1ab - configury: do look for sed 414b4748c2 - configure.ac: update directory space check 4246494bbb - Set the min PMIx version to v4.1.3 8103bff788 - Remove setting of PRTE_MCA_prte_base_help_aggregate. da12b9da13 - Restore noloop for logging. Signed-off-by: Austen Lauria <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes related to Issue #2505
v2.0.x
. That's just to make it easier on our testing. We will file a real PR againstmaster
once we have more of this working.1sided.c
) case seems to be working.mt_1sided.c
).I'm going to turn off the CI for a little while: