-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[APInt] Implement average functions without sign/zero-extension. NFC. #85212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Removing the extension to FullWidth should make them much more efficient in the 64-bit case, because 65-bit APInts use a separate allocation for their bits.
@llvm/pr-subscribers-llvm-support Author: Jay Foad (jayfoad) ChangesRemoving the extension to FullWidth should make them much more efficient Full diff: https://github.com/llvm/llvm-project/pull/85212.diff 1 Files Affected:
diff --git a/llvm/lib/Support/APInt.cpp b/llvm/lib/Support/APInt.cpp
index 7053f3b87682f8..371d42b221ed62 100644
--- a/llvm/lib/Support/APInt.cpp
+++ b/llvm/lib/Support/APInt.cpp
@@ -3096,37 +3096,21 @@ void llvm::LoadIntFromMemory(APInt &IntVal, const uint8_t *Src,
}
APInt APIntOps::avgFloorS(const APInt &C1, const APInt &C2) {
- // Return floor((C1 + C2)/2)
- assert(C1.getBitWidth() == C2.getBitWidth() && "Unequal bitwidths");
- unsigned FullWidth = C1.getBitWidth() + 1;
- APInt C1Ext = C1.sext(FullWidth);
- APInt C2Ext = C2.sext(FullWidth);
- return (C1Ext + C2Ext).extractBits(C1.getBitWidth(), 1);
+ // Return floor((C1 + C2) / 2)
+ return (C1 & C2) + (C1 ^ C2).ashr(1);
}
APInt APIntOps::avgFloorU(const APInt &C1, const APInt &C2) {
- // Return floor((C1 + C2)/2)
- assert(C1.getBitWidth() == C2.getBitWidth() && "Unequal bitwidths");
- unsigned FullWidth = C1.getBitWidth() + 1;
- APInt C1Ext = C1.zext(FullWidth);
- APInt C2Ext = C2.zext(FullWidth);
- return (C1Ext + C2Ext).extractBits(C1.getBitWidth(), 1);
+ // Return floor((C1 + C2) / 2)
+ return (C1 & C2) + (C1 ^ C2).lshr(1);
}
APInt APIntOps::avgCeilS(const APInt &C1, const APInt &C2) {
- // Return ceil((C1 + C2)/2)
- assert(C1.getBitWidth() == C2.getBitWidth() && "Unequal bitwidths");
- unsigned FullWidth = C1.getBitWidth() + 1;
- APInt C1Ext = C1.sext(FullWidth);
- APInt C2Ext = C2.sext(FullWidth);
- return (C1Ext + C2Ext + 1).extractBits(C1.getBitWidth(), 1);
+ // Return ceil((C1 + C2) / 2)
+ return (C1 | C2) - (C1 ^ C2).ashr(1);
}
APInt APIntOps::avgCeilU(const APInt &C1, const APInt &C2) {
- // Return ceil((C1 + C2)/2)
- assert(C1.getBitWidth() == C2.getBitWidth() && "Unequal bitwidths");
- unsigned FullWidth = C1.getBitWidth() + 1;
- APInt C1Ext = C1.zext(FullWidth);
- APInt C2Ext = C2.zext(FullWidth);
- return (C1Ext + C2Ext + 1).extractBits(C1.getBitWidth(), 1);
+ // Return ceil((C1 + C2) / 2)
+ return (C1 | C2) - (C1 ^ C2).ashr(1);
}
|
|
Only people with commit access can be tagged as a reviewer (and can accept PRs) - but comments on the patch are still encouraged |
Can you update the title to be more descriptive? "Avoid extension by using overflow-free variants, something something"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers
Removing the extension to FullWidth should make them much more efficient
in the 64-bit case, because 65-bit APInts use a separate allocation for
their bits.