-
Notifications
You must be signed in to change notification settings - Fork 900
first set of memkind optimizations #13095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first set of memkind optimizations #13095
Conversation
- add an assertion on the communicator structure if the memkind info object indicated that no gpu buffer is used - if the no_accel_buf assertion is set on a communicator, make the coll/accelerator component disqualify itself Signed-off-by: Edgar Gabriel <[email protected]>
- add a new flag indicating that the user guarantueed not to use accelerator memory with a particular file. - check that flag in common_ompio_check_gpu_buf() and return immediatly `false` if the flag is set, i.e. bypassing the accelerator check_addr() routine (which is supposed to be the expensive part). Signed-off-by: Edgar Gabriel <[email protected]>
OMPI_INFO_MEMKIND_ASSERT_UNDEFINED = 0, // no statement on memkind usage | ||
OMPI_INFO_MEMKIND_ASSERT_NO_ACCEL, // no accelerator memory is used | ||
OMPI_INFO_MEMKIND_ASSERT_ACCEL_DEVICE_ONLY, // only accelerator device memory used | ||
OMPI_INFO_MEMKIND_ASSERT_ACCEL_ALL // only accelerator memory (no restrictors) used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will include the bits for OMPI_INFO_MEMKIND_ASSERT_NO_ACCEL
, probably not what we want when we check flag & OMPI_INFO_MEMKIND_ASSERT_NO_ACCEL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a bit pattern, but an enum. I was contemplating whether to use a bitpatter instead (in which case your argument would have been correct), but that would have required additional processing afterwards on whether we have only mpi+system enabled or also a gpu one. With an enum we don't have that problem on this layer. Hence, we cannot do flag & OMPI_INFO_MEMKIND_ASSERT_NO_ACCEL
, but have to do flag == MPI_INFO_MEMKIND_ASSERT_NO_ACCEL
instead, see e.g. https://github.com/open-mpi/ompi/pull/13095/files#diff-9ab2a505b980955c4d5e74feeb016cb470239023de667666fafe9662f6884498R453
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right, I got confused by the assertion bit patterns.
@@ -47,7 +48,11 @@ void mca_common_ompio_check_gpu_buf ( ompio_file_t *fh, const void *buf, int *is | |||
|
|||
*is_gpu=0; | |||
*is_managed=0; | |||
|
|||
|
|||
if (fh->f_fh->f_flags & OMPI_FILE_ASSERT_NO_ACCEL_BUF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do the same for OMPI_INFO_MEMKIND_ASSERT_ACCEL_DEVICE_ONLY
and skip the check of the addr below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might be able to do that potentially. The downside that I see however in the DEVICE_ONLY
mode is that we loose the ability to retrieve the device_id where the buffer is located. Not sure whether that's relevant or not, but that was one of the reasons why I decided not to pursue that in the first stage.
OMPI_INFO_MEMKIND_ASSERT_UNDEFINED = 0, // no statement on memkind usage | ||
OMPI_INFO_MEMKIND_ASSERT_NO_ACCEL, // no accelerator memory is used | ||
OMPI_INFO_MEMKIND_ASSERT_ACCEL_DEVICE_ONLY, // only accelerator device memory used | ||
OMPI_INFO_MEMKIND_ASSERT_ACCEL_ALL // only accelerator memory (no restrictors) used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right, I got confused by the assertion bit patterns.
This is a first set of low-hanging optimizations around the memory_alloc_kind info object.
Specifically:
if the memory kinds requested and/or asserted by the user do not include any accelerator memory, we set a new assertion flag on the communicator structure.
if the new
no_accel_buf
assertion is set on a communicator when executing comm_query of the coll component, the coll/accelerator component will disqualify itself.similarly, if the memory kinds requested and/or asserted by the user do not include any accelerator memory, we set a new assertion flag on the file structure
if the new
no_accel_buf
assertion is set on a file handle when callingcommon_ompio_check_gpu_buf()
, we return immediately, bypassing the acceleratorcheck_addr()
functionality (which is supposed to be the expensive operation).