Skip to content

Optimize bit count polyfills #2914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 18, 2020
Merged

Conversation

MaxGraey
Copy link
Contributor

No description provided.

@@ -42,7 +42,7 @@ template<> int PopCount<uint32_t>(uint32_t v) {
}

template<> int PopCount<uint64_t>(uint64_t v) {
return PopCount((uint32_t)v) + PopCount((uint32_t)(v >> 32));
return PopCount((uint32_t)v) + (v >> 32 ? PopCount((uint32_t)(v >> 32)) : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change? This seems slightly harder to read than before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main motivation is consistency with CountLeadingZeros for 64-bit and potentially speedup calculation when high part of 64-bit is zero

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the consistency with CountLeadingZeroes is a non-goal here because they are fundamentally different optimizations. If we could rewrite CountLeadingZeroes to match PopCount, I think we would want to do that for improved readability.

If we're going to sacrifice some readability for performance, it would be good to see that the performance difference is measurable rather than hypothetical.

This PR LGTM other than this point, so it might be nice to split this out and land the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I switched to prev implementation for PopCount<uint64_t>

Comment on lines +68 to +70
template<typename T> bool IsPowerOf2(T v) {
return v != 0 && (v & (v - 1)) == 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 I like that this lets us delete code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually popcnt(x) == 1 (btw it include x != 0 case also) faster but only if we have native popcnt support. Alos LLVM & GCC smart enought to replace pattern above to popcnt(x) == 1 if it possible

@@ -54,8 +56,8 @@ template<> int PopCount<uint32_t>(uint32_t v) {
}

template<> int PopCount<uint64_t>(uint64_t v) {
#if __has_builtin(__builtin_popcountll) || defined(__GNUC__)
return __builtin_popcountll(v);
#if __has_builtin(__builtin_popcount) || defined(__GNUC__) || defined(_MSC_VER)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use __has_builtin(__builtin_popcount) instead __has_builtin(__builtin_popcountll) due to clang-format forcing line terminator and carry defined(_MSC_VER) to new line and this looks weird.


template<typename T, typename U> inline static T RotateLeft(T val, U count) {
T mask = sizeof(T) * CHAR_BIT - 1;
auto value = typename std::make_unsigned<T>::type(val);
Copy link
Contributor Author

@MaxGraey MaxGraey Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty important cast to unsigned. Otherwise not LLVM nor GCC can fold this to single rol / rot op

@MaxGraey MaxGraey requested a review from tlively June 17, 2020 17:58
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@tlively tlively merged commit f6eb790 into WebAssembly:master Jun 18, 2020
@MaxGraey MaxGraey deleted the optimize-bit-helpers branch June 18, 2020 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants