-
Notifications
You must be signed in to change notification settings - Fork 807
[ESIMD] Make esimd implementation of fmod compatible with std::fmod #6242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation needs some correctness fixes.
Currently, for fmod(-5.1, 3.0) it returns -5.1
Fixed |
reminder_sign_mask.merge(1.0f, 0.0f, reminder < 0); | ||
|
||
fmod = reminder + abs_x * reminder_sign_mask; | ||
return __ESIMD_NS::abs(fmod) * y_sign_mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is better and more correct, but still not fully compatible with std::fmod.
std::fmod(-0.0f, 1.0f) would return -0.0f, while the implementation here returns "+0.0f".
The LIT test (intel/llvm-test-suite#1045) passes because it compares the expected and computed results using '==', returning true even though they are not bitwise equal.
@akolesov-intel can you please give some advice on how to implement it the most efficient way that would also follow std::fmod rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Slava, you are correct. Fully compliant fmod implementation is not that simple. See attached codes for single and double precision cases. As usual there are fast paths (__imf_fmodf, __imf_fmod) and slow ones for large and special arguments (__imf_internal_sfmod, __imf_internal_dfmod). These implementations are fully compliant to standard and return exact results
fmod.zip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more logic to correctly propagate the sign. Also updated the test to add check for bit sign.
__ESIMD_NS::simd<float, N> x) { | ||
__ESIMD_NS::simd<int, N> v_quot; | ||
__ESIMD_NS::simd<float, N> fmod; | ||
__ESIMD_NS::simd<float, N> abs_x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is definitely more correct that what it was before this PR.
From other side, it is not close to that complex code attached by @akolesov-intel .
We probably, can have this code for now. @akolesov-intel do you see one or some obvious cases where the proposed code would give wrong result comparing to std::fmod?
Also, if keep the current variant, then some minor inefficiencies can be optimized (reducing number of compares, replacing MULs with OR/AND, less vector consts like 1.0,-1.0):
__ESIMD_NS::simd<float, N> abs_x = __ESIMD_NS::abs(x);
__ESIMD_NS::simd<float, N> abs_y = __ESIMD_NS::abs(y);
__ESIMD_NS::simd<float, N> reminder =
abs_y - abs_x * __ESIMD_NS::trunc<float>(abs_y / abs_x);
// After this line 'abs_x' means (reminder < 0 ? abs(x) : 0);
abs_x.merge(0.0, reminder >= 0);
__ESIMD_NS::simd<float, N> fmod = reminder + abs_x;
__ESIMD_NS::simd<float, N> fmod_abs = __ESIMD_NS::abs(fmod);
auto fmod_sign_mask = (y.template bit_cast_view<int32_t>()) & 0x80000000;
auto fmod_bits = (fmod_abs.template bit_cast_view<int32_t>()) | fmod_sign_mask;
return fmod_bits.template bit_cast_view<float>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is definitely more correct that what it was before this PR. From other side, it is not close to that complex code attached by @akolesov-intel . We probably, can have this code for now. @akolesov-intel do you see one or some obvious cases where the proposed code would give wrong result comparing to std::fmod?
It's quite hard to tell without testing, many combinations of two arguments are possible. I would expect some possible problems on denormals and IEEE special inputs (inf, NaN). If I got some spare time I will try to substitute our version by this one and run our internal tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have ran our tests for this implementation (did I convert it properly to plain C++?)
float __imf_fmodf (float y, float x)
{
float abs_x = std::abs(x);
float abs_y = std::abs(y);
float reminder = abs_y - abs_x * std::trunc(abs_y / abs_x);
abs_x = (reminder >= 0)?0.0:abs_x;
float fmod = reminder + abs_x;
float fmod_abs = std::abs(fmod);
auto fmod_sign_mask = as_int(y) & 0x80000000;
auto fmod_bits = as_int(fmod_abs) | fmod_sign_mask;
return as_float(fmod_bits);
}
Test reported 12% of incorrect results from whole dataset. Mostly they are related to special, denormal or very small/big arguments. See attached file
. REF1 is for reference (correct) result there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem with the code provided by @akolesov-intel is that it is not really vectorizable if at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's have this new and more correct version for now.
Perhaps, we can add even more correct implementation a bit later. I'll create internal tracker for this and other math functions.
/verify with intel/llvm-test-suite#1045 |
This fix, brings sycl::ext::intel::experimental::esimd::fmod implementation in line with std::fmod implementation.
The std::fmod implementation details are from https://en.cppreference.com/w/cpp/numeric/math/fmod
Complementary test PR intel/llvm-test-suite#1045