-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[instcombine, sha] Add Instruction Fold for Masked Merge #7145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
assigned to @LebedevRI |
Thanks for the nice testcase, optimization implemented in r108136. |
And reverted in r108141 … I hope to fix this later today. |
Thanks for working on this Benjamin! |
Benjamin: I haven't tracked down the root issue here, but I strongly suspect that this instcombine is just exposing a bug elsewhere in the compiler. An interesting aspect of this is that (on the clang build) the xform triggers either exclusively or almost so for cases when C/D are constants. I'm seeing tons of stuff like this: XFORM: %6 = and i8 %5, -2 ; [#uses=1] This is because m_Not is matching on constants, which is dubious. This should certainly be turned off for constants, because the and/or sequence is better for most targets. However, it looks like we're missing the inverse transformation: $ cat t.c int test2(int A, int B) { We should turn the later into the former in the case of a constant. I will continue investigating the latent bug, many of these are triggering on SRoA-created i128's, i256's etc, we really want to keep these and/or and not turn them into xor. However, I want to find and fix the latent bug. |
Actually, for armv5, we generate better code for the xor form, but I think I'd consider that a deficiency in the backend. |
patch v2 |
https://godbolt.org/g/KafbjK This looks pretty straight forward, so i could take a look. |
I don't know the history, but I wonder if it would be better to treat this as 2 problems rather than 2 specific folds as in https://reviews.llvm.org/rL108141 . First, reduce the final xor (and also 'add') to 'or' when the operands have no common bits set: Name: xor_to_or Name: add_to_or https://rise4fun.com/Alive/u6f Inverted operands to 'and' ops might be worth adding to llvm::haveNoCommonBitsSet(). |
Another consideration: does the backend recognize the sequential xor sequence if a target has 'andn' or a bit-select instruction? Eg, x86 with BMI: The 'not' isn't a separate instruction here, so it should always be better than the alternative (better throughput than sequential xor+and+xor because the 'and' and 'andn' can execute in parallel). AArch64 has 'bsl': So I think you have to make sure that the IR optimization doesn't cause a regression for this: define <4 x i32> @v(<4 x i32> %x, <4 x i32> %y, <4 x i32> %mask) { |
So in other words, the backend (what specifically?) should be able to unfold: I think that |
Eh, meant to post that into llvm/llvm-bugzilla-archive#37104 |
So this actually turns out to be more complicated. I see at least 5 steps:
#1 and #2 are prerequisites for #5. |
We already do add->or and xor->or, IIRC. |
Based on the testcase output, no. |
I know we do it in some cases, but maybe not your testcase, specifically. |
Correct - we have haveNoCommonBitsSet() called from visitAdd, but not visitXor. But it doesn't check for the case of a not'ed mask. SimplifyDemandedBits will also transform to 'or', but it can't handle the cases here AFAICT because we don't know the mask bits are constants, just that they're opposite. |
Committed tests for now https://reviews.llvm.org/rL330003 |
|
Backend is all done. Some further polishing will be needed - when the mask is constant, |
Committed in r333106! |
And what would you think, reverted in r333631 :) |
Roman, This is (partially) reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=8619, The oss bug was 'fixed' when you reverted your masked merge patch (D46814/rL333106) at rL333631, but if you intend to re-apply with an updated patch, then please ensure you add a fix for this: @g = extern_weak global i32
; Function Attrs: nounwind
define void @function() #0 {
entry:
%A14 = alloca i32
%L4 = load i32, i32* %A14
%A9 = alloca i32
%C26 = icmp ne i32 0, %L4
%or4 = or i32 or (i32 zext (i1 icmp eq (i32* @g, i32* null) to i32), i32 1), 65536
%B22 = srem i32 %or4, %or4
%B26 = or i32 %B22, %B22
%C5 = icmp sle i32 %L4, 0
%C9 = icmp eq i32 %B26, 0
br label %BB
BB: ; preds = %BB, %entry
%C8 = icmp sge i1 %C9, %C5
%C6 = icmp uge i1 %C9, %C26
br i1 undef, label %BB, label %BB2
BB2: ; preds = %BB
store i1 %C8, i1* undef
%L13 = load i1, i1* undef
%B44 = mul i1 %L13, %C6
store i1 %B44, i1* undef
store i32* null, i32** undef
ret void
}
attributes #0 = { nounwind } |
So no, i'll be surprised if it'll land again.
|
This is unfortunate, but I think you're right. We can't do this without freeze. For reference, the failing case was based on something like this: unsigned f(unsigned mask, unsigned x, unsigned y) {
return (mask & x) | (~mask & y);
}
void g(unsigned x) {
unsigned y;
unsigned z = f(-1, x, y);
// use z
} And apparently that's legal C code... But as you know, there's still activity/hope for freeze. :) |
[lldb] Annotate swift tests with @swifttest
Extended Description
When experimenting with SHA, I was pleased to notice that LLVM will fold the "Maj" function (x & y) ^ (x & z) ^ (y & z) down to ((z ^ y) & x) ^ (y & z).
The "Ch" function, though, doesn't fold. (x & y) | (~x & z) should become ((y ^ z) & x) ^ z, as mentioned on http://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge (as should (x & y) ^ (~x & z), the version used in the SHA standard).
(If you're wondering at the similarity, Maj(x,y,z) is equivalent to Ch(x, y|z, y&z). Using that implementation with the optimized Ch again gives optimal code from LLVM as it knows to fold (y|z)^(y&z) down to y^z.)
LLVM IR from Bitter Melon:
From the following C source:
The text was updated successfully, but these errors were encountered: