Skip to content

The default KIND of some size/position intrinsic functions should not be default integer #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
klausler opened this issue Nov 5, 2019 · 20 comments
Labels
Clause 16 Standard Clause 16: Intrinsic procedures and modules

Comments

@klausler
Copy link

klausler commented Nov 5, 2019

Fortran mandates that the default kind of INTEGER occupy one numeric storage unit; so does the default kind of REAL. In 2019, default REAL is expected to be IEEE-754 single precision. Consequently, the default kind of INTEGER ends up having to be 32 bits wide.

This is a problem for real applications using arrays that are (or can be) very large, as the default KIND= parameter values for intrinsic functions like SIZE, SHAPE, LBOUND, UBOUND, FINDLOC, MAXLOC, and MINLOC (*) are all defined to be the default kind of INTEGER. One must determine the proper kind of "long" integer (or assume it) and remember to specify that KIND= on every use of these intrinsic functions in order to ensure that they work with large arrays.

I propose that the default result kinds of these intrinsic functions be redefined to be processor-dependent, so that a processor supporting large memories can do the obvious right thing.

(*) this list is probably incomplete; maybe it should include LEN but that's less of an issue and it would be the hardest to change

@certik
Copy link
Member

certik commented Nov 5, 2019

I wasn't aware that size(A) will return an incorrect number if A is larger than 4GB. In fact, it already fails for larger than 2GB, here is an example with gfortran that fails:

program test_size
real, allocatable :: A(:)
allocate(A(3000000000_16))
A = 1
print *, size(A)
print *, size(A, kind=16)
end

When compiled and executed, it prints:

 -1294967296
 3000000000

The array gets correctly allocated and assigned to, but the default size(A) fails to return the correct result (it returns a wrapped around 32 bit integer value). The size(A, kind=16) works correctly.

This needs to be fixed.

@gronki
Copy link

gronki commented Nov 5, 2019 via email

@FortranFan
Copy link
Member

Please see #78 with a proposal by UK national body from year 2013 that I think had addressed the concern in this thread quite well.

@klausler
Copy link
Author

Please see #78 with a proposal by UK national body from year 2013 that I think had addressed the concern in this thread quite well.

There's some overlap, but it's not the same essential problem. Whether or not the program is able to define the default kinds of intrinsic types, the default kinds of the results of the particular intrinsic functions related to size should not be the default kind of INTEGER.

@sblionel
Copy link
Member

This is why these intrinsics now have an optional KIND= argument. The problem with changing the default behavior is that it would break some existing programs, which is almost always a killer.

One might think that implementations would start shifting default integer from 32 to 64 bits much the way that it changed from 16 to 32 bits in the late 70s. (I am ignoring old platforms with 36, 48 and 60-bit word sizes.) But then you run into the issue @klausler originally noted that this would also change the size of default REAL, and I don't think people are ready for that.

Offhand, I am not in favor of any proposal that adds a new implicit behavior. I understand that dealing with constants of non-default kind can be messy, and things such as SIZE can be problematic, but the programmer already needs to be aware when an array might exceed a default integer extent, and use larger kind integers throughout the code.

I know this general topic was discussed when the KIND arguments were added, but I can't find details in the 2014 papers.

@certik
Copy link
Member

certik commented Nov 11, 2019

I know this general topic was discussed when the KIND arguments were added, but I can't find details in the 2014 papers.

To fix this particular problem, that the discussion around a particular new feature gets lost, I plan to capture any such future (technical) discussion that happens in person at the committee and document it here in the relevant issues, so that the wider community as well as committee members can in the future reference the arguments that were made, and thus build upon the previous work that was done.

@klausler
Copy link
Author

klausler commented Nov 11, 2019

I understand full well that those intrinsics have KIND= arguments. Unfortunately, a program using large arrays must specify an adequately-sized kind value for every call to these intrinsics, and ensure that the libraries into which it calls are also free of any missing or inadequate KIND= argument.

Allowing implementations to ability to determine the default value of these KIND= arguments may lead to warnings and errors when codes that can't handle large arrays are recompiled. That seems preferable to mysterious and hard-to-debug crashes, and an implementation needs to pick either the "emit a message" or "crash mysteriously at runtime" option when compiling for large-memory targets; the codes that will fail have not been "broken" by the compiler either way, and the former seems more conformable with a desire to promote portability.

@gronki
Copy link

gronki commented Nov 11, 2019 via email

@FortranFan
Copy link
Member

@klausler wrote:

.. the default kinds of the results of the particular intrinsic functions related to size should not be the default kind of INTEGER.

The way things are, it might be too late now, user definable kinds like in the UK proposal appear the only option for future programmers if they seek certain brevity and cleanliness along with safety in their codes.

WG5 is unlikely to ever agree to a change to intrinsics such as SIZE as suggested in the original post, but it's fathomable WG5 may proceed with the UK proposal with user definable default KINDs at some stage.

@klausler
Copy link
Author

@klausler wrote:

.. the default kinds of the results of the particular intrinsic functions related to size should not be the default kind of INTEGER.

The way things are, it might be too late now, user definable kinds like in the UK proposal appear the only option for future programmers if they seek certain brevity and cleanliness along with safety in their codes.

WG5 is unlikely to ever agree to a change to intrinsics such as SIZE as suggested in the original post, but it's fathomable WG5 may proceed with the UK proposal with user definable default KINDs at some stage.

I repeat, these features are solving largely distinct problems. If the UK proposal were adopted, there would still be a problem.

I would hope that WG5 would approve a change to the specification of the default kinds of the results of these intrinsic functions; that would allow the f18 compiler to remove an item from its documented list of "intentional violations of the standard". Standardized or not, it seems like the right thing to do in an implementation for modern targets.

@FortranFan
Copy link
Member

@klausler wrote:

..
I repeat, these features are solving largely distinct problems. If the UK proposal were adopted, there would still be a problem.

I would hope that WG5 would approve a change to the specification of the default kinds of the results of these intrinsic functions; that would allow the f18 compiler to remove an item from its documented list of "intentional violations of the standard". Standardized or not, it seems like the right thing to do in an implementation for modern targets.

Well, some might argue what f18 is trying to address is also "solving largely distinct problems" particularly with SIZE intrinsic since it now includes the optional KIND option. And if f18 is adopting "intentional violations of the standard", then there must be users out there concerned about this, those who rather prefer a new processor such as f18 to adopt strict consistency with the standard by default. Now of course when an implementation feels strongly about certain stipulations in the standard, it can separately offer its users an alternate path forward,say an option to pursue another dialect e.g., how GCC/gfortran does with -std='gnu'.

@certik
Copy link
Member

certik commented Nov 12, 2019

One workaround can be that in Debug mode the compiler can emit code to check the size of the array at runtime and produce a warning (or error if instructed) if a size with incorrect kind option is being used, so that at least users have good means to ensure their code is not broken.

But as @sblionel suggested, even if you loop over the array with a default integer, the code will still break, so the user must be aware of this anyway. But a compiler (in Debug mode) can check this and tell users to fix their code (by adding the appropriate kind).

@klausler
Copy link
Author

One workaround can be that in Debug mode the compiler can emit code to check the size of the array at runtime and produce a warning (or error if instructed) if a size with incorrect kind option is being used, so that at least users have good means to ensure their code is not broken.

But as @sblionel suggested, even if you loop over the array with a default integer, the code will still break, so the user must be aware of this anyway. But a compiler (in Debug mode) can check this and tell users to fix their code (by adding the appropriate kind).

It's useful to the user to detect potential problems in their code before executtion time, when possible.

@certik
Copy link
Member

certik commented Nov 12, 2019

It's useful to the user to detect potential problems in their code before executtion time, when possible.

The code can read the size from an input file, in which case it will not be known until runtime. The only way that I can think of to give a warning at compile time is to keep track of how the array is allocated, and if at any point it is allocated using, say, integer(int64) (even if the value is read from an input file), then it will give warnings to all usages of size that do not have kind=int64 with it, as well as all integers used for iteration over the array. The only possible issue is if the array gets allocated in code that the compiler does not have access to. Otherwise it might actually be possible to check this at compile time.

@gronki
Copy link

gronki commented Nov 12, 2019

In C size_t is 8 bytes on x86_64.

#include <stdio.h>
#include <stddef.h>

int main() { size_t a; printf("sizeof(a) = %d\n", sizeof(a)); }

output: sizeof(a) = 8

The fact is, backwards compatibility is a poor reasoning here. No decently written code will break just because size returns 8 byte instead of 4 byte integer. And understanding backwards compatibility as "all codes that previously worked must still work" is inherently wrong with Fortran language giving so much forgiveness for poor coding practices for years (which was recently pointed out in one of the publications).

@certik
Copy link
Member

certik commented Nov 12, 2019

And understanding backwards compatibility as "all codes that previously worked must still work" is inherently wrong with Fortran language giving so much forgiveness for poor coding practices for years (which was recently pointed out in one of the publications).

Can you please point me to the publication you are referring to?

That's one of the great strengths of Fortran that old code continues running and does not require a massive rewrite like Python 3 forced on all Python 2 code. So we want to keep that feature. But there might be a way to get what we want without breaking old code.

@gronki
Copy link

gronki commented Nov 12, 2019 via email

@certik
Copy link
Member

certik commented Nov 12, 2019

I still see common blocks, data statements and gotos in recently developed codes which I think is terrible.

I agree with you those should not be used in new codes. But I think compilers should still support it, so that old codes continue to work.

@klausler
Copy link
Author

It's useful to the user to detect potential problems in their code before executtion time, when possible.

The code can read the size from an input file, in which case it will not be known until runtime. The only way that I can think of to give a warning at compile time is to keep track of how the array is allocated, and if at any point it is allocated using, say, integer(int64) (even if the value is read from an input file), then it will give warnings to all usages of size that do not have kind=int64 with it, as well as all integers used for iteration over the array. The only possible issue is if the array gets allocated in code that the compiler does not have access to. Otherwise it might actually be possible to check this at compile time.

What we can warn about at compilation time are things like DO J=1,SIZE(A) when the 64-bit size value must be truncated to a 32-bit default-kind integer J.

@jme52
Copy link

jme52 commented Jan 17, 2020

I have some questions:

  • Aren't the dimensions of arrays already part of current array descriptors? If that's the case, would it be that difficult / a bad idea that the standard demands in the definition of intrinsic procedures that if the kind of the result of a call to size will not represent correctly the value that size would return, the runtime aborts?
    Yes, this would break some existing programs, but only those that are poorly designed:
    program p
    use, intrinsic :: iso_fortran_env
    logical :: big
    real, allocatable :: A(:)
    read(*,*) big
    if (big) then
       allocate(A(2_int64**40_int64))
    else
       allocate(A(1))
    end if
    write(*,*) size(A)
    end program
    If the call to size above is legal Fortran 2018, it is at least a bad idea (if (big) the result is processor dependent?).
  • Rather than changing the default kind of the result of these functions to be processor dependent as you propose (or to the integer kind with the largest decimal exponent range, which the standard requires to be at least 18), would it be possible to remove in the near future the requirement that default integer, default real, and default logical have the same length? My understanding was that this is mainly a requirement to support common blocks, and they are already obsolescent.

@certik certik added the Clause 16 Standard Clause 16: Intrinsic procedures and modules label Apr 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clause 16 Standard Clause 16: Intrinsic procedures and modules
Projects
None yet
Development

No branches or pull requests

6 participants