Skip to content

Use compiler builtins to detect "simple common cases" in pp_add, pp_subtract, and pp_multiply #23503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: blead
Choose a base branch
from

Conversation

t-a-k
Copy link
Contributor

@t-a-k t-a-k commented Jul 29, 2025

This will hopefully make the code faster and smaller, and make more cases to be handled as "simple common cases".

Note that this change uses HAS_BUILTIN_{ADD,SUB,MUL}_OVERFLOW macros which have already been defined in config.h but seem not to have been used by existing code.

The behavior should be the same before and after this change.


  • This set of changes requires a perldelta entry, and it is included.

…vailable

This will hopefully make the code faster and smaller, and
make more cases to be handled as "simple common cases".

Note that this change uses HAS_BUILTIN_{ADD,SUB,MUL}_OVERFLOW macros
which have already been defined in config.h but seem not to have been
used by existing code.

t/op/64bitint.t: Add tests to exercise "simple common cases".
Note that these tests should pass even before this change.
@tonycoz
Copy link
Contributor

tonycoz commented Jul 30, 2025

This breaks Win32, which doesn't enable the __builtin_add_... etc builtins even for gcc.

I suspect it's due to long being 32-bits even on 64-bit Win32, but I haven't tried to debug it.

… in UV

If C compiler doesn't know __builtin_mul_overflow, S_uv_mul_overflow
will be implemented with fallback "long multiplication" algorithm,
but it had a bug that elemental multiplications were done in unsigned
long precision instead of UV precision.  It will lead wrong result
when unsigned long is narrower than UV (for example -Duse64bitint
on 32-bit platform).
@t-a-k
Copy link
Contributor Author

t-a-k commented Jul 31, 2025

This breaks Win32, which doesn't enable the __builtin_add_... etc builtins even for gcc.

I suspect it's due to long being 32-bits even on 64-bit Win32, but I haven't tried to debug it.

Thank you for your comment. My patch has a fallback code (similar to the code used before this change) for compilers with no overflow-checking builtins, but it had a bug (I was able to reproduce similar symptoms by ./Configure ... -Duse64bitint -Ud_builtin_mul_overflow on 32-bit x86 Linux). I've pushed a commit to fix this.

t-a-k added 2 commits August 6, 2025 02:13
(intended to be squashed before merge)
…st glance.

(intended to be squashed before merge)
inline.h Outdated
# ifndef IV_MUL_OVERFLOW_IS_EXPENSIVE
/* Strict overflow check for IV multiplication is generally expensive
* when IV is a multi-word integer. */
# define IV_MUL_OVERFLOW_IS_EXPENSIVE (IVSIZE > LONGSIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a reasonable test - if we enable the builtins for GCC on Win32 x86-64 this will be false even though it is a 64-bit platform.

Testing against PTRSIZE is probably better since that better matches the platform word size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. I've pushed a commit to change the test to against PTRSIZE.

inline.h Outdated

/* Define IV_*_OVERFLOW_IS_EXPENSIVE below to nonzero value
* if strict overflow checks are too expensive
* (for example, for CPUs that has no hardware overflow detection flag).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISC V doesn't have a hardware overflow flag (or any classic carry, zero etc flags), so the compiler generates more complex code:

bool my_chk_add(long a, long b, long *result) {
    return __builtin_add_overflow(a, b, result);
}
; RISC-V
my_chk_add(long, long, long*):
        add     a5,a0,a1
        slt     a0,a5,a0
        slti    a1,a1,0
        sub     a0,a1,a0
        sd      a5,0(a2)
        snez    a0,a0
        ret
; amd64
"my_chk_add(long, long, long*)":
        add     rdi, rsi
        mov     QWORD PTR [rdx], rdi
        seto    al
        ret
; arm64
my_chk_add(long, long, long*):
        adds    x1, x0, x1
        str     x1, [x2]
        cset    w0, vs
        ret

(no action needed here)

# endif

# if defined(I_STDCKDINT) && !IV_ADD_SUB_OVERFLOW_IS_EXPENSIVE
/* XXX Preparation for upcoming C23, but I_STDCKDINT is not yet tested */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern clang has stdckdint.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but it would require patches to Configure and so on (and I currently have no test environment with modern Clang or GCC 14+), so I intend to make a separate PR to enable it.

This will affect Win32 x86-64.
Thanks to @tonycoz for figuring this out.

(intended to be squashed before merge)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants