Skip to content

Environmental Inquiries for MPI_MAX_*_NAME #702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeffhammond opened this issue Apr 12, 2023 · 18 comments
Closed

Environmental Inquiries for MPI_MAX_*_NAME #702

jeffhammond opened this issue Apr 12, 2023 · 18 comments
Assignees
Labels
mpi-5.0 For inclusion in the MPI 5.0 standard wg-abi ABI Working Group

Comments

@jeffhammond
Copy link
Member

Problem

In the context of an ABI, we need to have the compile-time constants for MPI_MAX_*_NAME be upper bounds on all implementation values. However, it is in the best interest of applications to be able to determine the exact values of the following at runtime.

MPI_BSEND_OVERHEAD
MPI_MAX_PROCESSOR_NAME
MPI_MAX_LIBRARY_VERSION_STRING
MPI_MAX_ERROR_STRING
MPI_MAX_DATAREP_STRING
MPI_MAX_INFO_KEY
MPI_MAX_INFO_VAL
MPI_MAX_OBJECT_NAME
MPI_MAX_PORT_NAME

I see this similar to MPI_TAG_UB.

Proposal

These can be queried as attributes on MPI_COMM_WORLD or MPI_COMM_SELF. If we want to be nuanced, the MPI_MAX_*_NAME ones make sense on MPI_COMM_SELF since they are local properties, while MPI_BSEND_OVERHEAD could in theory be communicator-specific, and the value from MPI_COMM_WORLD is the upper bound on all of the derivative communicators.

I do not know how this works with sessions. Hopefully Dan will help me.

An alternative proposal to this is to make all of the associated functions that depend on MPI_MAX_*_NAME produce the required length in len when the input value is zero.

Changes to the Text

In Section 9.1.2

The list of predefined attribute keys include
...
MPI_BSEND_OVERHEAD_ENV The value of this constant, which may be less than the value of MPI_BSEND_OVERHEAD set at compile time.
MPI_MAX_<>_NAME_ENV The value of this constant, which may be less than the value of MPI_MAX_<>_NAME set at compile time.
...
Advice to users. The above attributes ending in _ENV exist because the compile-time constant may be larger than necessary. (End of advice to users.)

Impact on Implementations

This is trivial to implement, since implementations can just return the compile-time values if they want. If there is value to users in providing stricter sizes, they can do that.

Impact on Users

This benefits users who want to micro-optimize memory allocation. For example, MPI_GET_LIBRARY_VERSION may require up to 8KiB if the user allocates the maximum value.

References and Pull Requests

@Wee-Free-Scot
Copy link

How do attributes work in sessions?

The MPI_TAG_UB is the only system-defined attribute that is preserved in the sessions model.

MPI-4.0 section 9.1.2 page 453 lines 35-37:

In the Sessions Model, the attribute MPI_TAG_UB is attached to all communicators
created by MPI_COMM_CREATE_FROM_GROUP and
MPI_INTERCOMM_CREATE_FROM_GROUPS,

This approach can be followed for other attributes, with a case-by-case justification.

Should attributes be used for these values?

The justification for MPI_BSEND_OVERHEAD is: there is no other ABI-safe way to get this value. That's weak because we could define MPI_GET_BSEND_OVERHEAD however we want.

The justifications for MPI_MAX_[STRING_LEN] is longer because there is an "obvious" alternative: call the existing GET procedure with INOUT len==0 to get the required length, then call again with enough memory allocated. There is a compatibility counter argument. For example, MPI_GET_PROCESSOR_NAME says "The argument name must represent storage that is at least MPI_MAX_PROCESSOR_NAME characters long." and the length parameter is called "resultlen" but most critically resultlen is defined as INTENT(OUT) for the Fortran binding, not INOUT. Changing this existing procedure is not backwards compatible. This alternative would need new ABI-safe GET procedures, but that still leaves the existing procedures in a difficult place.

So, attributes is the right way to go because:

  1. no replacement new procedures
  2. no dangling old procedures
  3. no discussions about the semantics of INFO (requested/provided, hints/asserts, default, etc.)

@jeffhammond
Copy link
Member Author

I dispute the implication that changing OUT to INOUT breaks anything in Fortran. We don't need to agree to get this ticket passed, however.

@jeffhammond
Copy link
Member Author

Okay, now I see it. If the API changes to have the len argument be the size of the storage, and not MPI_MAX_PROCESSOR_NAME, we have a problem. The case where the input is zero isn't the problem. It's if someone passes a non-zero value less than MPI_MAX_PROCESSOR_NAME but larger than the length of the character string. Modern Fortran can deal with this but the older Fortran API may not be able to do so. Adding a requirement here could break existing code, although I suspect the scenario is extremely unlikely.

However, we have a situation here where an implementation can never return a value of MPI_MAX_PROCESSOR_NAME_ENV less than MPI_MAX_PROCESSOR_NAME without breaking compatibility with their existing ABI. Thus, we should provide an advice to users that MPI_MAX_PROCESSOR_NAME_ENV is only able to save them memory with the standard ABI, and to implementers to be careful to return MPI_MAX_PROCESSOR_NAME_ENV = MPI_MAX_PROCESSOR_NAME with their existing ABI implementation.

Because of the aforementioned issue, it seems less likely that implementations are going to bother optimizing MPI_MAX_PROCESSOR_NAME_ENV specific to the standard ABI, when the worst case scenario requires 8KiB of memory.

@devreal do you still think we should do this? How important is it for the Open-MPI implementation of the standard ABI to save users no more than 8KiB of storage?

@Wee-Free-Scot
Copy link

The way I see it -- we could leave the constants as not ABI-safe implementation-defined values as long as we provide a mechanism that is ABI-safe to determine those values. The attribute approach works as that mechanism. The INFO approach would work but NO, for reasons. The new getter approach would work but NO, for different reasons.

Thus, the ABI story is: do NOT use the MPI_MAX_WHATEVER constant; use MPI_COMM_GET_ATTR (with a well-defined ABI-safe calling convention) with attribute key MPI_MAX_WHATEVER_ATTR (with a well-defined ABI-safe constant value) instead. The value obtained from the attribute will be the same as the value of the constant -- the difference is that the attribute is ABI-safe, whereas the constant is not.

@devreal
Copy link

devreal commented Apr 12, 2023

I assume he reason these were constants in the first place was to allow for fast static allocations. Allocating 8KB in a stack variable makes my toe nails curl. We even have an AtoU saying that:

MPI_MAX_INFO_VAL might be very large, so it might not be wise to declare a string of that size.

We don't have such a warning for any of the other constants as far as I can see, even though implementations do define significantly larger sizes for them (I assume @jeffhammond you refer to MPI_MAX_LIBRARY_VERSION_STRING for the 8KB?).

Having said that, people probably have been allocating automatic variables of such size for many years and it's safe. I would prefer making these sizes attributes and deprecating the old constants, though.

@jeffhammond
Copy link
Member Author

Ok I'll push forward and if we fail, it doesn't really matter.

@jeffhammond
Copy link
Member Author

Thus, the ABI story is: do NOT use the MPI_MAX_WHATEVER constant; use MPI_COMM_GET_ATTR (with a well-defined ABI-safe calling convention) with attribute key MPI_MAX_WHATEVER_ATTR (with a well-defined ABI-safe constant value) instead. The value obtained from the attribute will be the same as the value of the constant -- the difference is that the attribute is ABI-safe, whereas the constant is not.

I intend these constants to be part of the ABI and not change them ever, since the negative impact of wasting e.g. 8KiB only diminishes with time.

Are you proposing to say that the ABI doesn't provide any guarantees about MPI_MAX_*_NAME? That breaks API compatibility with existing codes. What is the upside of this?

@Wee-Free-Scot
Copy link

Are you proposing to say that the ABI doesn't provide any guarantees about MPI_MAX_*_NAME?

Yes, it feels to me that there is no need for the constants to be included in the ABI if there is an ABI-safe mechanism to get those values -- especially if we can eventually deprecate and remove those constants.

That breaks API compatibility with existing codes. What is the upside of this?

Existing codes use the constants, which will still exist until they are removed (if they ever are) so no API incompatibilities there.
Existing codes will not use the new attributes so no API incompatibilities there.

Folks who care about ABI will switch from using the constants to using the attributes. Nobody else will notice or care (until we remove the constants, if we ever do and MPI libraries actually remove them as well).

@jeffhammond
Copy link
Member Author

jeffhammond commented Apr 12, 2023

so we are going to require MPI_MAX_*_NAME but not specify the values in the ABI? that isn't an ABI then. it will mean that everyone who wants an ABI has to either change their code to use the attributes or go out and find the maximum possible value of every MPI_MAX_*_NAME and hard-code those in their app.

people who want the ABI aren't going to abandon pre-ABI implementations that don't have the attributes on Day 1. at the same time, they want an ABI ASAP. people who want an ABI are going to need to write new code with preprocessor logic for the attributes case, and the upper bound fallback otherwise. that's worse in every way.

i just don't see what the point of not providing a full ABI is. is this benefitting implementers, or memory-conscious users, or some other group?

@simonbyrne @dalcinl you guys have any comments here?

@simonbyrne
Copy link

simonbyrne commented Apr 12, 2023

There is some related discussion in my earlier issue #159 (though my aim there was to figure out which ABI is being used, which is obviously moot if an ABI is standardized).

There are are a couple of problems with getter functions:

  1. most of the proposed approaches (via attributes or info objects) are only valid after MPI_Init. However MPI_MAX_LIBRARY_VERSION_STRING is needed for MPI_Get_library_version, which can be called at any time.
    • While it is valid to call MPI_Info_get_* before MPI_Init, you would need some sort of valid info object to query (or a function to initialize one); this may also make it awkward to query MPI_MAX_INFO_KEY/MPI_MAX_INFO_VAL, which in MPI.jl we use to check that the getter key is a valid length.
  2. It makes supporting multiple versions of MPI even more convoluted: we first need to query MPI_Get_version, and then decide whether to use our pre-defined constants or the getter functions. Basically it makes maintaining 3rd-party language bindings even harder.

An alternative proposal to this is to make all of the associated functions that depend on MPI_MAX_*_NAME produce the required length in len when the input value is zero.

I'd be in favor of this for the relevant functions: it's a standard pattern (e.g. it is used widely in the Windows APIs), and doesn't require any new API additions. It also makes it easy to query lengths which may vary depending on the argument (e.g. MPI_Comm_get_name). Combined with setting the constants to suitably large values that encompass all feasible values, I think that should address all concerns. (MPI_BSEND_OVERHEAD is obviously the exception here, but having never used it I will defer to others)

@dalcinl
Copy link

dalcinl commented Apr 13, 2023

@jeffhammond Whatever approach is taken, it has to be backward compatible in terms of API and usage. I believe this means we have to define all these constants with a reasonable upper bound. Otherwise, if we do not have compile-time constants, they cannot be used for statically allocated strings (VLAs are an optional feature in the latest C standards). In the mean time, maybe we can convince the MPICH folks to live with something significantly smaller than 8KB. MPICH currently dumps in the output string all the configure and C/CXX/F77/FC compile flags.

@simonbyrne I like very much your proposal, but I'm afraid it is not backward-compatible in usage. Look for example to Cython code I currently have in mpi4py:

def Get_processor_name() -> str:
    cdef char name[MPI_MAX_PROCESSOR_NAME+1]
    cdef int nlen = 0
    CHKERR( MPI_Get_processor_name(name, &nlen) )
    return tompistr(name, nlen)```

Up to MPI-4.1, initializing nlen = 0 (or any other value, or leaving it uninitialized) would have no effect in the behavior of MPI_Get_processor_name. However, with the new proposed semantics, the code above would return garbage without error.

A possible mechanism we could use is to introduce a new query-by-name routine. For example:

int MPI_Query_length_by_name(const char *name, int *result);
int MPI_Query_length_by_name_c(const char *name, MPI_Count *result);
...

int bsend_overhead, max_proc_name;
MPI_Query_length_by_name("MPI_BSEND_OVERHEAD", &bsend_overhead);
MPI_Query_length_by_name("MPI_MAX_PROCESSOR_NAME", &max_proc_name);

There is at least one well known precedent for this approach: in macOS (and FreeBSD), you have sysctlbyname().
This routine would be trivial to implement, it is future-proof, and it should be trivial for implementations to support its call outside of the MPI_Init/Finalize or MPI_Session brackets. A minor nit (beside the routine name) is to decide whether the name lookup would be case insensitive, which is probably a good idea for the peace of mind of Fortran users.

@jeffhammond
Copy link
Member Author

jeffhammond commented Apr 13, 2023

We can lobby MPICH to use less than 8K in their implementation of the ABI at least, but nobody has provided a strong argument why 8K is a problem. Do we have any example of real-world failures because of this? Is anyone arguing that 8K is a memory bottleneck anywhere?

I wrote a test and if I set the stack limit to 32K, I can't compile the trivial program. Clang crashes without a stack size of at least 64K. I also can't edit the file anymore (in that window) because Vim won't start. The only context where stack limits are that small are special purpose devices where many programs don't work.

I did find that stack sizes smaller than 16K exist. For example, https://downloads.ti.com/docs/esd/SPRUHV7/c-c-system-stack-stdz0542698.html shows that a TI PRU 32-bit RISC core has a stack size of 256B. However, in that case, most of the MPI_MAX_* values in use today don't work, so either this is not a relevant target for MPI or users of that platform use the heap for such strings.

https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/ has a list of stack sizes for more relevant targets (operating systems). The only limits that are even close are thread-local stacks with older releases of OpenBSD and Alpine. The process stack limits are 1 or 8 MiB.

I can't find anything on Blue Gene/L or /P but /Q docs say that it's not possible to create a thread stack size smaller than 128K, so I'm inclined to think that there wouldn't be a problem here.

7.5.1 Thread stack size for the Blue Gene/Q system

The thread stack size depends on the system configuration:

Minimum thread stack size

The minimum thread stack size is determined by glibc. In glibc 2.10,
the value for PTHREAD_STACK_MIN is defined as 128 KB for
PowerPC64. The smallest allowable stack in glibc 2.12.2 is also
128 KB. If the stack size is smaller than the minimum value, the
pthread_create() function returns an error.

Default thread stack size

The default thread stack size on the Blue Gene/Q system is 4 MB.

@simonbyrne
Copy link

Up to MPI-4.1, initializing nlen = 0 (or any other value, or leaving it uninitialized) would have no effect in the behavior of MPI_Get_processor_name. However, with the new proposed semantics, the code above would return garbage without error.

@dalcinl That's a good point. One solution would be to combine it with a special sentinel value for the name argument (say MPI_STRING_LENGTH). In other words, your pattern would become

char * name;
int nlen;
MPI_Get_processor_name(MPI_STRING_LENGTH, &nlen);
name = (char*) malloc((nlen+1)*sizeof(char));
MPI_Get_processor_name(name, &nlen);

I do agree with @jeffhammond though that the constants should be retained, and be part of the ABI though.

@jeffhammond
Copy link
Member Author

I'm inclined to do no further work on this topic until somebody provides evidence that 8K strings limits the portability of MPI in a meaningful way.

@dalcinl
Copy link

dalcinl commented Apr 13, 2023

@jeffhammond It is not really about limiting portability, but rather good practice and interoperability. If every library you use starts to assume that stack space is unlimited, eventually you may exhaust stack, specially in multithreaded scenarios. For example, from your previous link: On Linux, the process stack size is 8MB. Imagine that you have 128 cores, two threads per core. Then a stack allocation of 8K is about 25% of the space you have available per thread. Of course, this stack exhaustion example is hypothetical, I don't expect 256 threads calling MPI_Get_library_version() concurrently.

AFAIK, not one ever complained to MPICH folks about such issues. But the large value is controversial, and you know that in committees (and life in general) everyone has opinions, many times uninformed. And someone may complain loudly about it, and not hear to your very valid points, and ignore your request for proofs. If we can get the MPICH team on board in advance before this proposal scalates, we eliminate a point of friction.

@jeffhammond
Copy link
Member Author

I'm going to propose 8K and insist anyone who doesn't like it to bring evidence. Nobody is required to stack allocate anyways. The set of people who worry about stack exhaustion and the people who need to use malloc for these strings is a circle.

@jeffhammond
Copy link
Member Author

The fact that MPICH is widely used is the pretty strong evidence that nobody actually cares in the real world.

@wesbland wesbland moved this to To Do in MPI 5.0 Apr 19, 2023
@wesbland wesbland added the mpi-5.0 For inclusion in the MPI 5.0 standard label Apr 19, 2023
@jeffhammond
Copy link
Member Author

Closing this. If 8K stacks bother someone, they can propose the solution. Otherwise, the heap is a fine place to store MPI strings.

@github-project-automation github-project-automation bot moved this from To Do to Done in MPI 5.0 Dec 17, 2024
@wesbland wesbland closed this as not planned Won't fix, can't repro, duplicate, stale Mar 7, 2025
@wesbland wesbland removed this from MPI 5.0 Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpi-5.0 For inclusion in the MPI 5.0 standard wg-abi ABI Working Group
Projects
None yet
Development

No branches or pull requests

7 participants