Skip to content

Add functions to check multiple request statuses at once #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mahermanns opened this issue Aug 26, 2021 · 17 comments
Closed

Add functions to check multiple request statuses at once #519

mahermanns opened this issue Aug 26, 2021 · 17 comments
Assignees
Labels
chap-p2p Point to Point Communication Chapter Committee had reading Completed the formal proposal reading mpi-4.1 For inclusion in the MPI 4.1 standard passed final vote Passed the final formal vote passed first vote Passed the first formal vote wg-tools Tools Working Group
Milestone

Comments

@mahermanns
Copy link
Member

mahermanns commented Aug 26, 2021

Problem

The function MPI_Request_get_status provides completion information for a single given request handle, without actually freeing the request handle (i.e., setting it to MPI_REQUEST_NULL).

In wrappers for performance tools often specific information needs to be written when a request completed (e.g., tracing event records) and the request handles are used as a key for the lookup of internal information. As a tool can only decide whether the key would have been needed after it has been reset in a test call, the incoming request array is duplicated for reference after completion has been detected.

It would be beneficial for tool wrappers if functions symmetric to MPI_Test_(any|some|all) would exist, as that would eliminate the need of request duplication before calling into the PMPI layer.

Also using MPI_Request_get_status inside the MPI_(Test|Wait)_(any|some|all) wrappers by the tool is sub-optimal as that might change completion order depending on how requests are tested by the tool.

An example of how MPI_Request_get_status could be used in MPI_Test:

int MPI_Test(MPI_Request* req, int* flag, MPI_Status* status){
  int ret = PMPI_Request_get_status(*req, flag, status);
  if(*flag){
    // tool code to handle the successful test
    PMPI_Test(req, flag, MPI_STATUS_IGNORE);
  }
  return ret;
}

Here are example of the new proposed calls could be used:

int MPI_Testsome(int incount, MPI_Request req[], int* outcount, int* indices, MPI_Status* statuses[]){
  int ret = PMPI_Request_get_status_some(incount, req, outcount, indices, statuses, flag);
  for(int i=0; i < *outcount; i++){
    // tool code to handle the successful test on req[indices[i]]
    PMPI_Test(req[indices[i]], &flag, MPI_STATUS_IGNORE);
  }
  return ret;
}
int MPI_Testany(int count, MPI_Request req[], int* index, int* flag, MPI_Status* status){
  int ret = PMPI_Request_get_status_any(count, req, index, flag, statuses);
  if(flag){
    // tool code to handle the successful test on req[index]
    PMPI_Test(req[index], &flag, MPI_STATUS_IGNORE);
  }
  return ret;
}
int MPI_Testall(int incount, MPI_Request req[], int* flag, MPI_Status* statuses[]){
  int ret = PMPI_Request_get_status_all(incount, req, flag, statuses);
  if(flag){
    // tool code to handle the successful test on all req
    PMPI_Testall(req, &flag, MPI_STATUSES_IGNORE);
  }
  return ret;
}

Proposal

Adding the additional three calls (see text changes for actual function names) to the standard.

Changes to the Text

tbd.

Impact on Implementations

Implementation and support for those functions.

Impact on Users

Wrappers would not have to copy a potentially large request array before calling into PMPI.

References and Pull Requests

@mahermanns mahermanns added the wg-tools Tools Working Group label Aug 26, 2021
@mahermanns
Copy link
Member Author

During a discussion with @schulzm, @wrwilliams, and myself we could not quite settle on whether MPI_Request_get_status is a completing call or not (and how a completing call is actually defined). We also discovered that MPI_Request_get_status is not listed in the semantics addendum document, but it feels like it should.

@mahermanns mahermanns added the chap-p2p Point to Point Communication Chapter Committee label Aug 26, 2021
@wesbland
Copy link
Member

I'll assume this is MPI 4.1 for now and we can move it later if necessary.

@wesbland wesbland added the mpi-4.1 For inclusion in the MPI 4.1 standard label Aug 26, 2021
@mahermanns
Copy link
Member Author

Tools WG, Oct 21, 2021: Next to the Tools community these functions are also useful for layered libraries that need to query the status of a request without destroying it.

@wrwilliams wrwilliams self-assigned this Jan 21, 2022
@wrwilliams
Copy link

As I attempt to write up a PR for this, it occurs to me that the Test[any|some|all] cases and the Wait[any|some|all] cases are not analogous in a tool's ability to benefit from get_status_[any|some|all]. For the Test* cases, what we have in the issue description is correct and useful regardless of whether get_status_* make progress or not. For the Wait* cases, however, we are looking at something like the following as our motivating example:

int MPI_Waitall(int incount, MPI_Request req[], MPI_Status* statuses[]){
  int flag = 0;
  while(!flag) {
    int ret = PMPI_Request_get_status_all(incount, req, &flag, statuses);
    if(flag){
      // tool code to handle the successful wait on all req
      PMPI_Waitall(incount, req, &flag, MPI_STATUSES_IGNORE);
    }
  }
  return ret;
}

Obviously such an example can only work if MPI_Request_get_status_all makes progress, or some other progress-guaranteeing function is called inside the while.

So, two questions:

  1. Do we want request_get_status* to guarantee progress?
  2. Is this a valid motivating example for Waitall?

@wgropp
Copy link

wgropp commented Jan 21, 2022

I would say no. We already have MPI_Testall that makes progress and provides information on request state, and that is how examples like this are typically written. If Request_get_status does guarantee progress, what routine should I use for a lighter weight access to the status in a request?

@jprotze
Copy link

jprotze commented Jan 21, 2022

For the Test* cases, what we have in the issue description is correct and useful regardless of whether get_status_* make progress or not.

If request_get_status_all does not make progress, the MPI_Testall function in the issue is not correct, as it does not make progress.
In this case, I suggest to not use the request_get_status family, but add new functions similar to the test family, which does not free the request.

@wrwilliams
Copy link

We already have MPI_Testall that makes progress and provides information on request state, and that is how examples like this are typically written.

The problem with MPI_Testall here is that after a successful test, we are guaranteed that each element of req is freed and overwritten, though. So there is no window where a tool can say "Now I know that the wait condition has been met, and I can use the request objects to update internal data".

If Request_get_status does guarantee progress, what routine should I use for a lighter weight access to the status in a request?

I fear this is the crux of the issue: we seem to want the get_status interface(s) to serve two purposes simultaneously, namely that they are lightweight and nondestructive. From the tools side we focused on the nondestructive aspect of this but both are important.

Let's consider not Waitall, but Waitany--for Waitall, we at least have the band-aid available of knowing that all passed-in requests will be freed and have their statuses set, and there's therefore no waste in copying everything. For Waitany (and Waitsome), we still have to look up and copy the tool/layer data associated with all requests before calling PMPI_Waitany (or indeed PMPI_Testany if we want to write our own internal loop).

This does however show me that our conception within the Tools WG that these three families of functions (Wait/Test/get_status) are simply three layers of a reasonable implementation is probably flawed. Conceptually is a "non-destructive Test" a better fit for what we want than a "progressing get_status"? I would have nothing against naming these as modified Test functions if so.

@jprotze since we cross-posted: aren't we covered by the fact that we call PMPI_Testall in addition to PMPI_request_get_status_all in that example, thus ensuring that progress occurs during MPI_Testall? Nevertheless, this may indeed be a case where a nondestructive variant of Test is clearer and cleaner.

@jprotze
Copy link

jprotze commented Jan 21, 2022

@jprotze since we cross-posted: aren't we covered by the fact that we call PMPI_Testall in addition to PMPI_request_get_status_all in that example, thus ensuring that progress occurs during MPI_Testall? Nevertheless, this may indeed be a case where a nondestructive variant of Test is clearer and cleaner.

The idea of all the examples is to call MPI_Test* only on success with the requests completed according to the get_status function to finally release these requests. So, no this function does not help to make progress on non-completed requests.

@wrwilliams
Copy link

The idea of all the examples is to call MPI_Test* only on success with the requests completed according to the get_status function to finally release these requests. So, no this function does not help to make progress on non-completed requests.

...right. It has been a little while since we wrote those examples. And if the function is going to make progress it has to do so on all possible execution paths, doesn't it?

@devreal
Copy link

devreal commented Jan 21, 2022

The problem here is that MPI does not make progress explicit. If it did, tools would have control over triggering progress and performing a non-destructive non-progressing test. I agree with @wgropp that adding progress to PMPI_Request_get_status defeats the purpose of having a lightweight status query function.

I wonder whether you could simply chop up the input array and test/wait on the chunks. That way you don't need to allocate temporary heap memory to hold copies and don't have to query the tool information before you know whether a request actually completes. Copying requests should be fairly cheap and having 16 of them (or maybe more) on the stack should be no issue.

MPI_Test is simple:

int MPI_Test(MPI_Request* req, int* flag, MPI_Status* status){
  MPI_Request tmp = *req;
  PMPI_Test(&req, flag, status);
  if (*flag) {
    // do your thing with tmp
  }
  return ret;
}
int MPI_Testsome(int incount, MPI_Request req[], int* outcount, int* indices, MPI_Status* statuses[]){
  MPI_Request tmp_req[16];
  int tmp_idx[16];
  *outcount = 0;
  for (int i = 0; i < incount; i+=16) {
    int tmp_outcount;
    int tmp_incount = min(16, incount - i);
    memcpy(tmp_req, &req[i], tmp_incount);
    ret = PMPI_Testsome(tmp_incount, &req[i], &tmp_outcount, tmp_idx, &statuses[*outcount]);
    if (tmp_outcount) {
      // do your thing, handle the index array
      *outcount += tmp_outcount;
      // you may break out of the loop here and still be correct...
    }
  }
  return ret;
}

MPI_Waitsome would be similar, except that you have to loop until *outcount becomes non-zero. MPI_Testany and MPI_Waitany can be implemented in a similar way.

MPI_Waitall can be implemented using MPI_Waitall and MPI_Testall implemented using MPI_Testall. MPI_Waitsome and MPI_Waitany have to fall back to polling, which is not ideal (e.g., defeats some of the optimizations implementations may do to reduce congestion between multiple threads) but you'd have to do the same with non-destructive testany/some.

I'd be curious what the overhead of this scheme is. My guess is that it is negligible compared to the rest of the overhead tools typically bring to the table (no offense :)).

@wrwilliams
Copy link

Summary from forum meeting 2FEB2022:

  • This should depend on, but be separate from, Issue K - MPI_REQUEST_GET_STATUS with same progress as MPI_TEST #468. We cannot reasonably write text describing these new functions without good text for MPI_Request_get_status, which we currently lack. @Wee-Free-Scot is working on this; I am happy to help out there. Once Issue K - MPI_REQUEST_GET_STATUS with same progress as MPI_TEST #468 has at least a clear proposal it will be straightforward to write a parallel proposal here.
  • The question of whether an explicitly non-progressing version of MPI_Request_get_status is needed is also separate, since we have at least a preliminary decision since last year that MPI_Request_get_status should make progress. General sentiment was that there are reasonable use cases for a non-progress-guaranteeing version of this (these) interface(s) when dealing with accelerators in order to avoid forcing strong progress models on implementations.
  • We do also need to clarify the request state diagram in order to have terminology to describe request_get_status's behavior, which is part of/a dependency of Issue K - MPI_REQUEST_GET_STATUS with same progress as MPI_TEST #468.

There were no objections to adding the multiple request_get_status function family, provided that we properly clarify request_get_status first and mirror its language. We also need to take care that the request_get_status_* behavior is properly specified for generalized requests.

@wesbland wesbland moved this to To Do in MPI 4.1 Jul 5, 2022
@wesbland wesbland added this to MPI 4.1 Jul 5, 2022
@wesbland
Copy link
Member

I’m going to propose moving this to MPI 5.0. There’s more discussion to be had here. If someone objects and thinks we’ll be ready to read this soon, leave a comment and we can discuss bringing it back into MPI 4.1.

@wesbland wesbland added mpi-6 For inclusion in the MPI 5.1 or 6.0 standard and removed mpi-4.1 For inclusion in the MPI 4.1 standard labels Jul 14, 2022
@wesbland wesbland removed this from MPI 4.1 Jul 14, 2022
@wrwilliams
Copy link

PDF:
mpi-report-PR799.pdf

@wesbland wesbland added this to MPI 4.1 Feb 22, 2023
@github-project-automation github-project-automation bot moved this to To Do in MPI 4.1 Feb 22, 2023
@wesbland wesbland added mpi-4.1 For inclusion in the MPI 4.1 standard and removed mpi-6 For inclusion in the MPI 5.1 or 6.0 standard labels Feb 22, 2023
@wesbland wesbland moved this from To Do to In Progress in MPI 4.1 Feb 22, 2023
@wesbland wesbland moved this to To Do in MPI Next Feb 22, 2023
@wesbland wesbland removed this from MPI Next Feb 22, 2023
@wesbland wesbland added this to the March 2023 milestone Mar 1, 2023
@wesbland wesbland added had reading Completed the formal proposal reading and removed scheduled reading Reading is scheduled for the next meeting labels Mar 21, 2023
@wesbland wesbland moved this from In Progress to Had Reading in MPI 4.1 Mar 21, 2023
@wesbland wesbland modified the milestones: March 2023, May 2023 Apr 11, 2023
@wesbland
Copy link
Member

wesbland commented May 2, 2023

Had no-no reading on 2023-05-02.

@wesbland
Copy link
Member

wesbland commented May 3, 2023

This passed a no-no vote.

Yes No Abstain
30 0 2

@wesbland wesbland added scheduled vote scheduled first vote First vote is scheduled for the next meeting and removed no-no labels May 3, 2023
@wesbland
Copy link
Member

wesbland commented May 3, 2023

This passed a 1st vote.

Yes No Abstain
27 0 5

@wesbland wesbland added passed first vote Passed the first formal vote and removed scheduled first vote First vote is scheduled for the next meeting labels May 3, 2023
@wesbland wesbland moved this from Had Reading to Passed 1st Vote in MPI 4.1 May 3, 2023
@wesbland wesbland added the scheduled second vote Second vote is scheduled for the next meeting label May 22, 2023
@wesbland wesbland modified the milestones: May 2023, July 2023 May 22, 2023
@wesbland
Copy link
Member

This passed a 2nd vote.

Yes No Abstain
33 0 1

@wesbland wesbland added passed final vote Passed the final formal vote and removed scheduled second vote Second vote is scheduled for the next meeting labels Jul 12, 2023
@wesbland wesbland moved this from Passed 1st Vote to Passed 2nd Vote in MPI 4.1 Jul 12, 2023
@wgropp wgropp closed this as completed Jul 12, 2023
@github-project-automation github-project-automation bot moved this from Passed 2nd Vote to Done in MPI 4.1 Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chap-p2p Point to Point Communication Chapter Committee had reading Completed the formal proposal reading mpi-4.1 For inclusion in the MPI 4.1 standard passed final vote Passed the final formal vote passed first vote Passed the first formal vote wg-tools Tools Working Group
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants