Skip to content
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 31 additions & 7 deletions sycl/include/sycl/ext/intel/experimental/esimd/math.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1371,21 +1371,45 @@ template <> ESIMD_INLINE float atan2(float y, float x) {
template <int N>
ESIMD_INLINE __ESIMD_NS::simd<float, N> fmod(__ESIMD_NS::simd<float, N> y,
__ESIMD_NS::simd<float, N> x) {
__ESIMD_NS::simd<int, N> v_quot;
__ESIMD_NS::simd<float, N> fmod;
__ESIMD_NS::simd<float, N> abs_x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is definitely more correct that what it was before this PR.
From other side, it is not close to that complex code attached by @akolesov-intel .
We probably, can have this code for now. @akolesov-intel do you see one or some obvious cases where the proposed code would give wrong result comparing to std::fmod?

Also, if keep the current variant, then some minor inefficiencies can be optimized (reducing number of compares, replacing MULs with OR/AND, less vector consts like 1.0,-1.0):

  __ESIMD_NS::simd<float, N> abs_x = __ESIMD_NS::abs(x);
  __ESIMD_NS::simd<float, N> abs_y = __ESIMD_NS::abs(y);
  __ESIMD_NS::simd<float, N> reminder =
      abs_y - abs_x * __ESIMD_NS::trunc<float>(abs_y / abs_x);

  // After this line 'abs_x' means (reminder < 0 ? abs(x) : 0);
  abs_x.merge(0.0, reminder >= 0);
  __ESIMD_NS::simd<float, N> fmod = reminder + abs_x;

  __ESIMD_NS::simd<float, N> fmod_abs = __ESIMD_NS::abs(fmod);
  auto fmod_sign_mask = (y.template bit_cast_view<int32_t>()) & 0x80000000;
  auto fmod_bits = (fmod_abs.template bit_cast_view<int32_t>()) | fmod_sign_mask;

  return fmod_bits.template bit_cast_view<float>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is definitely more correct that what it was before this PR. From other side, it is not close to that complex code attached by @akolesov-intel . We probably, can have this code for now. @akolesov-intel do you see one or some obvious cases where the proposed code would give wrong result comparing to std::fmod?

It's quite hard to tell without testing, many combinations of two arguments are possible. I would expect some possible problems on denormals and IEEE special inputs (inf, NaN). If I got some spare time I will try to substitute our version by this one and run our internal tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have ran our tests for this implementation (did I convert it properly to plain C++?)

float __imf_fmodf (float y, float x)
{
float abs_x = std::abs(x);
float abs_y = std::abs(y);
float reminder = abs_y - abs_x * std::trunc(abs_y / abs_x);
abs_x = (reminder >= 0)?0.0:abs_x;
float fmod = reminder + abs_x;
float fmod_abs = std::abs(fmod);
auto fmod_sign_mask = as_int(y) & 0x80000000;
auto fmod_bits = as_int(fmod_abs) | fmod_sign_mask;
return as_float(fmod_bits);
}

Test reported 12% of incorrect results from whole dataset. Mostly they are related to special, denormal or very small/big arguments. See attached file
. REF1 is for reference (correct) result there.

ts_fmod_s_xa.zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with the code provided by @akolesov-intel is that it is not really vectorizable if at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's have this new and more correct version for now.
Perhaps, we can add even more correct implementation a bit later. I'll create internal tracker for this and other math functions.

__ESIMD_NS::simd<float, N> abs_y;
__ESIMD_NS::simd<float, N> reminder;
__ESIMD_NS::simd<float, N> reminder_sign_mask;
__ESIMD_NS::simd<float, N> result_sign_mask;

v_quot = convert<int>(y / x);
fmod = y - x * convert<float>(v_quot);
return fmod;
auto bits = y.template bit_cast_view<int32_t>();
result_sign_mask.merge(-1.0f, 1.0f, (bits & 0x80000000) != 0);
abs_x = __ESIMD_NS::abs(x);
abs_y = __ESIMD_NS::abs(y);
reminder = abs_y - abs_x * __ESIMD_NS::trunc<float>(abs_y / abs_x);
reminder_sign_mask.merge(1.0f, 0.0f, reminder < 0);

fmod = reminder + abs_x * reminder_sign_mask;

return __ESIMD_NS::abs(fmod) * result_sign_mask;
}

// For Scalar Input
template <> ESIMD_INLINE float fmod(float y, float x) {
int v_quot;
__ESIMD_NS::simd<float, 1> fmod;
__ESIMD_NS::simd<float, 1> abs_x;
__ESIMD_NS::simd<float, 1> abs_y;
__ESIMD_NS::simd<float, 1> reminder;
__ESIMD_NS::simd<float, 1> reminder_sign_mask;
__ESIMD_NS::simd<float, 1> result_sign_mask;
__ESIMD_NS::simd<float, 1> simd_y = y;

auto bits = simd_y.template bit_cast_view<int32_t>();
result_sign_mask.merge(-1.0f, 1.0f, (bits & 0x80000000) != 0);
abs_x = __ESIMD_NS::abs(x);
abs_y = __ESIMD_NS::abs(y);
reminder = abs_y - abs_x * __ESIMD_NS::trunc<float>(abs_y / abs_x);
reminder_sign_mask.merge(1.0f, 0.0f, reminder < 0);

fmod =
__ESIMD_NS::abs(reminder + abs_x * reminder_sign_mask) * result_sign_mask;

v_quot = (int)(y / x);
fmod = y - x * v_quot;
return fmod[0];
}

Expand Down