-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Redundant branches with ctlz and cttz #47467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We expand the intrinsics in -codegenprepare, and I'm not sure where we would solve this. machine-cse seems like the most likely candidate, but it would require tracking eflags state across basic blocks. Not sure if we do that: TEST64rr %5, %5, implicit-def $eflags
JCC_1 %bb.2, 4, implicit $eflags IR going into SDAG: define zeroext i1 @_ZN10playground20can_represent_as_f6417h8c9d47bab619cb5fE(i64 %x) unnamed_addr {
start:
%cmpz = icmp eq i64 %x, 0
br i1 %cmpz, label %cond.end, label %cond.false
cond.false: ; preds = %start
%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
br label %cond.end
cond.end: ; preds = %start, %cond.false
%ctz = phi i64 [ 64, %start ], [ %0, %cond.false ]
%1 = trunc i64 %ctz to i32
%cmpz3 = icmp eq i64 %x, 0
br i1 %cmpz3, label %cond.end2, label %cond.false1
cond.false1: ; preds = %cond.end
%2 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
br label %cond.end2
cond.end2: ; preds = %cond.end, %cond.false1
%ctz4 = phi i64 [ 64, %cond.end ], [ %2, %cond.false1 ]
%3 = trunc i64 %ctz4 to i32
%_2 = add nuw nsw i32 %1, %3
%4 = icmp ugt i32 %_2, 10
ret i1 %4
} |
Note that this should not be an issue when compiling for more recent x86:
|
After #102885 the branches are now avoided on any x86 target with cmov: playground::can_represent_as_f64::h8c9d47bab619cb5f: # @playground::can_represent_as_f64::h8c9d47bab619cb5f
bsrq %rdi, %rax
movl $127, %ecx
cmovneq %rax, %rcx
xorl $63, %ecx
bsfq %rdi, %rax
movl $64, %edx
cmovneq %rax, %rdx
addl %ecx, %edx
cmpl $11, %edx
setae %al
retq |
After #123623 x86_64 codegen will simplify to: movl $127, %eax
bsrq %rdi, %rax
xorl $63, %eax
movl $64, %ecx
rep bsfq %rdi, %rcx
addl %eax, %ecx
cmpl $11, %ecx
setae %al
retq |
Resolving this - on any target with CMOV this is now branchless, and on x86_64 (or LZCNT/TZCNT) the codegen is pretty much optimal. |
Extended Description
Rust code:
LLVM IR:
Assembly:
Instead of performing the comparison twice, the code should immediately branch to LBB0_4.
The text was updated successfully, but these errors were encountered: