Skip to content

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Jun 3, 2022

This fix, brings sycl::ext::intel::experimental::esimd::fmod implementation in line with std::fmod implementation.
The std::fmod implementation details are from https://en.cppreference.com/w/cpp/numeric/math/fmod
Complementary test PR intel/llvm-test-suite#1045

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation needs some correctness fixes.
Currently, for fmod(-5.1, 3.0) it returns -5.1

@fineg74
Copy link
Contributor Author

fineg74 commented Jun 14, 2022

The implementation needs some correctness fixes. Currently, for fmod(-5.1, 3.0) it returns -5.1

Fixed

@fineg74 fineg74 requested a review from v-klochkov June 14, 2022 23:15
reminder_sign_mask.merge(1.0f, 0.0f, reminder < 0);

fmod = reminder + abs_x * reminder_sign_mask;
return __ESIMD_NS::abs(fmod) * y_sign_mask;
Copy link
Contributor

@v-klochkov v-klochkov Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is better and more correct, but still not fully compatible with std::fmod.

std::fmod(-0.0f, 1.0f) would return -0.0f, while the implementation here returns "+0.0f".
The LIT test (intel/llvm-test-suite#1045) passes because it compares the expected and computed results using '==', returning true even though they are not bitwise equal.

@akolesov-intel can you please give some advice on how to implement it the most efficient way that would also follow std::fmod rules?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Slava, you are correct. Fully compliant fmod implementation is not that simple. See attached codes for single and double precision cases. As usual there are fast paths (__imf_fmodf, __imf_fmod) and slow ones for large and special arguments (__imf_internal_sfmod, __imf_internal_dfmod). These implementations are fully compliant to standard and return exact results
fmod.zip
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more logic to correctly propagate the sign. Also updated the test to add check for bit sign.

@fineg74 fineg74 requested a review from v-klochkov June 16, 2022 03:52
__ESIMD_NS::simd<float, N> x) {
__ESIMD_NS::simd<int, N> v_quot;
__ESIMD_NS::simd<float, N> fmod;
__ESIMD_NS::simd<float, N> abs_x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is definitely more correct that what it was before this PR.
From other side, it is not close to that complex code attached by @akolesov-intel .
We probably, can have this code for now. @akolesov-intel do you see one or some obvious cases where the proposed code would give wrong result comparing to std::fmod?

Also, if keep the current variant, then some minor inefficiencies can be optimized (reducing number of compares, replacing MULs with OR/AND, less vector consts like 1.0,-1.0):

  __ESIMD_NS::simd<float, N> abs_x = __ESIMD_NS::abs(x);
  __ESIMD_NS::simd<float, N> abs_y = __ESIMD_NS::abs(y);
  __ESIMD_NS::simd<float, N> reminder =
      abs_y - abs_x * __ESIMD_NS::trunc<float>(abs_y / abs_x);

  // After this line 'abs_x' means (reminder < 0 ? abs(x) : 0);
  abs_x.merge(0.0, reminder >= 0);
  __ESIMD_NS::simd<float, N> fmod = reminder + abs_x;

  __ESIMD_NS::simd<float, N> fmod_abs = __ESIMD_NS::abs(fmod);
  auto fmod_sign_mask = (y.template bit_cast_view<int32_t>()) & 0x80000000;
  auto fmod_bits = (fmod_abs.template bit_cast_view<int32_t>()) | fmod_sign_mask;

  return fmod_bits.template bit_cast_view<float>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is definitely more correct that what it was before this PR. From other side, it is not close to that complex code attached by @akolesov-intel . We probably, can have this code for now. @akolesov-intel do you see one or some obvious cases where the proposed code would give wrong result comparing to std::fmod?

It's quite hard to tell without testing, many combinations of two arguments are possible. I would expect some possible problems on denormals and IEEE special inputs (inf, NaN). If I got some spare time I will try to substitute our version by this one and run our internal tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have ran our tests for this implementation (did I convert it properly to plain C++?)

float __imf_fmodf (float y, float x)
{
float abs_x = std::abs(x);
float abs_y = std::abs(y);
float reminder = abs_y - abs_x * std::trunc(abs_y / abs_x);
abs_x = (reminder >= 0)?0.0:abs_x;
float fmod = reminder + abs_x;
float fmod_abs = std::abs(fmod);
auto fmod_sign_mask = as_int(y) & 0x80000000;
auto fmod_bits = as_int(fmod_abs) | fmod_sign_mask;
return as_float(fmod_bits);
}

Test reported 12% of incorrect results from whole dataset. Mostly they are related to special, denormal or very small/big arguments. See attached file
. REF1 is for reference (correct) result there.

ts_fmod_s_xa.zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with the code provided by @akolesov-intel is that it is not really vectorizable if at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's have this new and more correct version for now.
Perhaps, we can add even more correct implementation a bit later. I'll create internal tracker for this and other math functions.

@v-klochkov
Copy link
Contributor

/verify with intel/llvm-test-suite#1045

@v-klochkov v-klochkov merged commit 7a076bd into intel:sycl Jun 22, 2022
@fineg74 fineg74 deleted the fmod branch July 14, 2022 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants