nonblocking reductions in Fortran with non-contiguous buffers of different layouts

# Problem

This is almost impossible to implement:
```fortran
  type(MPI_Request) :: R
  integer, dimension(300) :: A
  integer, dimension(200) :: B
  MPI_Iallreduce(A(1:300:3), B(1:200:2), 100, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, R)
  MPI_Wait(R, MPI_STATUS_IGNORE)
```

In MPICH and [VAPAA](https://github.com/jeffhammond/vapaa), non-contiguous Fortran subarrays are supported by creating a datatype corresponding to the `CFI_cdesc_t` coming from Fortran (e.g. [MPICH implementation](https://github.com/pmodels/mpich/blob/main/src/binding/fortran/use_mpi_f08/wrappers_c/cdesc.c)).

In most MPI functions, there is one datatype for every buffer.  However, for reductions, there is only one datatype, so there is no way to capture the layout information of both the input and output buffers, if they are different.

Furthermore, if we are creating a custom datatype, we have to use a custom reduction operator / function.  `MPI_User_function` has only one datatype argument, so again, it is impossible to carry along the required layout information.

Obviously, in blocking functions, we can allocate temporary buffers and make contiguous copies where necessary, but in the non-blocking case, we can't free the buffer since we don't have completion callbacks.

# Proposal

I prefer Option 3...

## Option 1 - completion callbacks (add stuff to the standard)

I can solve the nonblocking problem with completion callbacks that allow me to cleanup temporaries.  This is a very general solution that has lots of use cases, but the Forum seems to be opposed to it.

In the blocking case, we don't have to do anything.

## Option 2 - implementations are very complicated (no changes to the standard)

Implementations that do something far more complicated that what VAPAA and MPICH do right now can solve this, but it is not pretty.  They have to pass the CFI information down in to the implementation of reductions and handle different layouts, or they have allocate temporaries and clean them up using an internal mechanism.  I suspect implementations have the capability to do the latter already and would go that route, if only because most MPI implementations do not want to deal with `CFI_cdesc_t` any more than absolutely necessary.

## Option 3 - prohibit this usage (backwards-incompatible changes to the standard)

The easy solution is for us to add a backwards-incompatible restriction that reductions require Fortran buffers to have equivalent layouts.  This is only technically backwards-incompatible, because nobody supports this today (at least in the nonblocking case - the blocking case might work due to implicit contiguous copy-in and copy-out, which Fortran compilers do when they see the `CONTIGUOUS` attribute).

I will argue that we implicitly require this anyways by virtue of having only one datatype argument, which means that users cannot pass buffers with different layouts from C.  It is only because of the invisible layout differences associated with Fortran 2018 that users can do this.

# Changes to the Text

Option 3 would add text to state that users are required to pass Fortran buffers of equivalent shape.

We need to be careful about how we say "equivalent shape" because one can have identical memory layouts corresponding to different Fortran shapes, and we only need to constrain the former.

# Impact on Implementations

Option 3 requires no implementation changes.

# Impact on Users

Users are no longer allowed to do crazy things that are at best unreliable today.

# References and Pull Requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nonblocking reductions in Fortran with non-contiguous buffers of different layouts #663

Problem

Proposal

Option 1 - completion callbacks (add stuff to the standard)

Option 2 - implementations are very complicated (no changes to the standard)

Option 3 - prohibit this usage (backwards-incompatible changes to the standard)

Changes to the Text

Impact on Implementations

Impact on Users

References and Pull Requests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nonblocking reductions in Fortran with non-contiguous buffers of different layouts #663

Description

Problem

Proposal

Option 1 - completion callbacks (add stuff to the standard)

Option 2 - implementations are very complicated (no changes to the standard)

Option 3 - prohibit this usage (backwards-incompatible changes to the standard)

Changes to the Text

Impact on Implementations

Impact on Users

References and Pull Requests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions