-
Notifications
You must be signed in to change notification settings - Fork 786
Optimize bit count polyfills #2914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/support/bits.cpp
Outdated
@@ -42,7 +42,7 @@ template<> int PopCount<uint32_t>(uint32_t v) { | |||
} | |||
|
|||
template<> int PopCount<uint64_t>(uint64_t v) { | |||
return PopCount((uint32_t)v) + PopCount((uint32_t)(v >> 32)); | |||
return PopCount((uint32_t)v) + (v >> 32 ? PopCount((uint32_t)(v >> 32)) : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this change? This seems slightly harder to read than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main motivation is consistency with CountLeadingZeros for 64-bit and potentially speedup calculation when high part of 64-bit is zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the consistency with CountLeadingZeroes is a non-goal here because they are fundamentally different optimizations. If we could rewrite CountLeadingZeroes to match PopCount, I think we would want to do that for improved readability.
If we're going to sacrifice some readability for performance, it would be good to see that the performance difference is measurable rather than hypothetical.
This PR LGTM other than this point, so it might be nice to split this out and land the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I switched to prev implementation for PopCount<uint64_t>
template<typename T> bool IsPowerOf2(T v) { | ||
return v != 0 && (v & (v - 1)) == 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍 I like that this lets us delete code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually popcnt(x) == 1
(btw it include x != 0 case also) faster but only if we have native popcnt support. Alos LLVM & GCC smart enought to replace pattern above to popcnt(x) == 1 if it possible
@@ -54,8 +56,8 @@ template<> int PopCount<uint32_t>(uint32_t v) { | |||
} | |||
|
|||
template<> int PopCount<uint64_t>(uint64_t v) { | |||
#if __has_builtin(__builtin_popcountll) || defined(__GNUC__) | |||
return __builtin_popcountll(v); | |||
#if __has_builtin(__builtin_popcount) || defined(__GNUC__) || defined(_MSC_VER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use __has_builtin(__builtin_popcount)
instead __has_builtin(__builtin_popcountll)
due to clang-format forcing line terminator and carry defined(_MSC_VER)
to new line and this looks weird.
|
||
template<typename T, typename U> inline static T RotateLeft(T val, U count) { | ||
T mask = sizeof(T) * CHAR_BIT - 1; | ||
auto value = typename std::make_unsigned<T>::type(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty important cast to unsigned
. Otherwise not LLVM nor GCC can fold this to single rol
/ rot
op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
No description provided.