-
Notifications
You must be signed in to change notification settings - Fork 589
Use compiler builtins to detect "simple common cases" in pp_add, pp_subtract, and pp_multiply #23503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Conversation
…vailable This will hopefully make the code faster and smaller, and make more cases to be handled as "simple common cases". Note that this change uses HAS_BUILTIN_{ADD,SUB,MUL}_OVERFLOW macros which have already been defined in config.h but seem not to have been used by existing code. t/op/64bitint.t: Add tests to exercise "simple common cases". Note that these tests should pass even before this change.
This breaks Win32, which doesn't enable the I suspect it's due to |
… in UV If C compiler doesn't know __builtin_mul_overflow, S_uv_mul_overflow will be implemented with fallback "long multiplication" algorithm, but it had a bug that elemental multiplications were done in unsigned long precision instead of UV precision. It will lead wrong result when unsigned long is narrower than UV (for example -Duse64bitint on 32-bit platform).
Thank you for your comment. My patch has a fallback code (similar to the code used before this change) for compilers with no overflow-checking builtins, but it had a bug (I was able to reproduce similar symptoms by |
(intended to be squashed before merge)
…st glance. (intended to be squashed before merge)
inline.h
Outdated
# ifndef IV_MUL_OVERFLOW_IS_EXPENSIVE | ||
/* Strict overflow check for IV multiplication is generally expensive | ||
* when IV is a multi-word integer. */ | ||
# define IV_MUL_OVERFLOW_IS_EXPENSIVE (IVSIZE > LONGSIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a reasonable test - if we enable the builtins for GCC on Win32 x86-64 this will be false even though it is a 64-bit platform.
Testing against PTRSIZE is probably better since that better matches the platform word size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out. I've pushed a commit to change the test to against PTRSIZE.
inline.h
Outdated
|
||
/* Define IV_*_OVERFLOW_IS_EXPENSIVE below to nonzero value | ||
* if strict overflow checks are too expensive | ||
* (for example, for CPUs that has no hardware overflow detection flag). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RISC V doesn't have a hardware overflow flag (or any classic carry, zero etc flags), so the compiler generates more complex code:
bool my_chk_add(long a, long b, long *result) {
return __builtin_add_overflow(a, b, result);
}
; RISC-V
my_chk_add(long, long, long*):
add a5,a0,a1
slt a0,a5,a0
slti a1,a1,0
sub a0,a1,a0
sd a5,0(a2)
snez a0,a0
ret
; amd64
"my_chk_add(long, long, long*)":
add rdi, rsi
mov QWORD PTR [rdx], rdi
seto al
ret
; arm64
my_chk_add(long, long, long*):
adds x1, x0, x1
str x1, [x2]
cset w0, vs
ret
(no action needed here)
# endif | ||
|
||
# if defined(I_STDCKDINT) && !IV_ADD_SUB_OVERFLOW_IS_EXPENSIVE | ||
/* XXX Preparation for upcoming C23, but I_STDCKDINT is not yet tested */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modern clang has stdckdint.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but it would require patches to Configure
and so on (and I currently have no test environment with modern Clang or GCC 14+), so I intend to make a separate PR to enable it.
This will affect Win32 x86-64. Thanks to @tonycoz for figuring this out. (intended to be squashed before merge)
This will hopefully make the code faster and smaller, and make more cases to be handled as "simple common cases".
Note that this change uses
HAS_BUILTIN_{ADD,SUB,MUL}_OVERFLOW
macros which have already been defined inconfig.h
but seem not to have been used by existing code.The behavior should be the same before and after this change.