Description
This issue is either:
a) a missed optimization in the x86-64 backend; or
b) a wrong optimization in the aarch64 backend.
Consider the following code:
#include<stdint.h>
#include<stdbool.h>
bool is_i8(double x) {
return x == (int8_t) x;
}
bool is_i32(double x) {
return x == (int32_t)x;
}
As per the C standard, these functions invoke undefined behavior if given arguments that, rounded to an integer, don't fit in the desired type. Thus, is_i8
can always be replaced with is_i32
(alive2 proof from the optimized IR).
However, the x86-64 backend does not do this, and thus is_i8
has an unnecessary movsx eax, al
instruction that can be eliminated. The aarch64 backend does do this optimization. compiler explorer.
GCC preserves the int8_t cast on both x86-64 and ARM64 (and everything else I tested in CE): https://godbolt.org/z/78MbdGbPz. (EDIT: on gcc≤13.2, this is only the case with -ftrapping-math
, which gcc has on by default - adding -fno-trapping-math
(and -msse4.1
on x86-64) it'll convert it to a round-comparison; on gcc≥13.3 it keeps the integer conversion always as far as I can tell)
And while C does allow the optimization in question, it means that x == (T)x
cannot be used as a check for whether the floating-point value x
fits in the integer type T
(even though it would work if the conversion gave any valid value of T
in place of UB/poison). And, as far as I can tell, there is no alternative way to do a check like this anywhere near as performantly, without writing platform-specific assembly, which is, IMO, a quite problematic issue, though not really a clang-specific one.