Skip to content

LLVM 15 regression: fpext half to fp128 #56911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewrk opened this issue Aug 4, 2022 · 7 comments
Closed

LLVM 15 regression: fpext half to fp128 #56911

andrewrk opened this issue Aug 4, 2022 · 7 comments
Labels
backend:X86 question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@andrewrk
Copy link
Member

andrewrk commented Aug 4, 2022

release/15.x branch, commit 0214d98

Based on the following downstream test case:

test "cast f16 to f128" {
    var x: f16 = 1234.0;
    try expect(@as(f128, 1234.0) == x);
}

LLVM IR reduction. Expected main to return 0, however it returns 1.

source_filename = "test"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i8 @main() unnamed_addr #1 {
Entry:
  %result = alloca i8, align 1
  %x = alloca half, align 2
  store half 0xH64D2, ptr %x, align 2
  %0 = load half, ptr %x, align 2
  %1 = fpext half %0 to fp128
  %2 = call i32 @__eqtf2(fp128 0xL00000000000000004009348000000000, fp128 %1)
  %3 = icmp eq i32 %2, 0
  br i1 %3, label %Then, label %Else

Then:                                             ; preds = %Entry
  store i8 0, ptr %result, align 1
  %4 = load i8, ptr %result, align 1
  ret i8 %4

Else:                                             ; preds = %Entry
  store i8 1, ptr %result, align 1
  %5 = load i8, ptr %result, align 1
  ret i8 %5
}


define i32 @__eqtf2(fp128 %0, fp128 %1) #1 {
Entry:
  %result = alloca i32, align 4
  %a = alloca fp128, align 16
  %b = alloca fp128, align 16
  store fp128 %0, ptr %a, align 16
  store fp128 %1, ptr %b, align 16
  %2 = load fp128, ptr %a, align 16
  %3 = load fp128, ptr %b, align 16
  %4 = call i32 @__cmptf2(fp128 %2, fp128 %3)
  store i32 %4, ptr %result, align 4
  %5 = load i32, ptr %result, align 4
  ret i32 %5
}

define i32 @__cmptf2(fp128 %0, fp128 %1) #1 {
Entry:
  %result = alloca i32, align 4
  %2 = alloca fp128, align 16
  %3 = alloca fp128, align 16
  %a = alloca fp128, align 16
  %b = alloca fp128, align 16
  store fp128 %0, ptr %a, align 16
  store fp128 %1, ptr %b, align 16
  %4 = load fp128, ptr %a, align 16
  store fp128 %4, ptr %2, align 16
  %5 = load fp128, ptr %b, align 16
  store fp128 %5, ptr %3, align 16
  %6 = call fastcc i32 @compiler_rt.comparef.cmpf2.73(fp128 %4, fp128 %5)
  store i32 %6, ptr %result, align 4
  %7 = load i32, ptr %result, align 4
  ret i32 %7
}

define internal fastcc i32 @compiler_rt.comparef.cmpf2.73(fp128 %0, fp128 %1) unnamed_addr #3 {
Entry:
  %result = alloca i32, align 4
  %aInt = alloca i128, align 16
  %bInt = alloca i128, align 16
  %aAbs = alloca i128, align 16
  %bAbs = alloca i128, align 16
  %a = alloca fp128, align 16
  %b = alloca fp128, align 16
  store fp128 %0, ptr %a, align 16
  store fp128 %1, ptr %b, align 16
  %2 = load fp128, ptr %a, align 16
  store fp128 %2, ptr %aInt, align 16
  %3 = load fp128, ptr %b, align 16
  store fp128 %3, ptr %bInt, align 16
  %4 = load i128, ptr %aInt, align 16
  %5 = and i128 %4, 170141183460469231731687303715884105727
  store i128 %5, ptr %aAbs, align 16
  %6 = load i128, ptr %bInt, align 16
  %7 = and i128 %6, 170141183460469231731687303715884105727
  store i128 %7, ptr %bAbs, align 16
  %8 = load i128, ptr %aAbs, align 16
  %9 = icmp ugt i128 %8, 170135991163610696904058773219554885632
  br i1 %9, label %BoolOrTrue, label %BoolOrFalse

BoolOrFalse:                                      ; preds = %Entry
  %10 = load i128, ptr %bAbs, align 16
  %11 = icmp ugt i128 %10, 170135991163610696904058773219554885632
  br label %BoolOrTrue

BoolOrTrue:                                       ; preds = %BoolOrFalse, %Entry
  %12 = phi i1 [ %9, %Entry ], [ %11, %BoolOrFalse ]
  br i1 %12, label %Then, label %Else

Then:                                             ; preds = %BoolOrTrue
  store i32 1, ptr %result, align 4
  %13 = load i32, ptr %result, align 4
  ret i32 %13

Else:                                             ; preds = %BoolOrTrue
  br label %EndIf

EndIf:                                            ; preds = %Else
  %14 = load i128, ptr %aAbs, align 16
  %15 = load i128, ptr %bAbs, align 16
  %16 = or i128 %14, %15
  %17 = icmp eq i128 %16, 0
  br i1 %17, label %Then1, label %Else2

Then1:                                            ; preds = %EndIf
  store i32 0, ptr %result, align 4
  %18 = load i32, ptr %result, align 4
  ret i32 %18

Else2:                                            ; preds = %EndIf
  br label %EndIf3

EndIf3:                                           ; preds = %Else2
  %19 = load i128, ptr %aInt, align 16
  %20 = load i128, ptr %bInt, align 16
  %21 = and i128 %19, %20
  %22 = icmp sge i128 %21, 0
  br i1 %22, label %Then4, label %Else9

Then4:                                            ; preds = %EndIf3
  %23 = load i128, ptr %aInt, align 16
  %24 = load i128, ptr %bInt, align 16
  %25 = icmp slt i128 %23, %24
  br i1 %25, label %Then5, label %Else6

Then5:                                            ; preds = %Then4
  store i32 -1, ptr %result, align 4
  %26 = load i32, ptr %result, align 4
  ret i32 %26

Else6:                                            ; preds = %Then4
  %27 = load i128, ptr %aInt, align 16
  %28 = load i128, ptr %bInt, align 16
  %29 = icmp eq i128 %27, %28
  br i1 %29, label %Then7, label %Else8

Then7:                                            ; preds = %Else6
  store i32 0, ptr %result, align 4
  %30 = load i32, ptr %result, align 4
  ret i32 %30

Else8:                                            ; preds = %Else6
  store i32 1, ptr %result, align 4
  %31 = load i32, ptr %result, align 4
  ret i32 %31

Else9:                                            ; preds = %EndIf3
  %32 = load i128, ptr %aInt, align 16
  %33 = load i128, ptr %bInt, align 16
  %34 = icmp sgt i128 %32, %33
  br i1 %34, label %Then10, label %Else11

Then10:                                           ; preds = %Else9
  store i32 -1, ptr %result, align 4
  %35 = load i32, ptr %result, align 4
  ret i32 %35

Else11:                                           ; preds = %Else9
  %36 = load i128, ptr %aInt, align 16
  %37 = load i128, ptr %bInt, align 16
  %38 = icmp eq i128 %36, %37
  br i1 %38, label %Then12, label %Else13

Then12:                                           ; preds = %Else11
  store i32 0, ptr %result, align 4
  %39 = load i32, ptr %result, align 4
  ret i32 %39

Else13:                                           ; preds = %Else11
  store i32 1, ptr %result, align 4
  %40 = load i32, ptr %result, align 4
  ret i32 %40
}

attributes #1 = { nobuiltin nounwind "frame-pointer"="all" "probe-stack"="__zig_probe_stack" "target-cpu"="skylake" "target-features"="-16bit-mode,-32bit-mode,-3dnow,-3dnowa,+64bit,+adx,+aes,-amx-bf16,-amx-int8,-amx-tile,+avx,+avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxvnni,+bmi,+bmi2,-branchfusion,-cldemote,+clflushopt,-clwb,-clzero,+cmov,+crc32,+cx16,+cx8,-enqcmd,+ermsb,+f16c,-false-deps-getmant,-false-deps-lzcnt-tzcnt,-false-deps-mulc,-false-deps-mullq,-false-deps-perm,+false-deps-popcnt,-false-deps-range,-fast-11bytenop,+fast-15bytenop,-fast-7bytenop,-fast-bextr,+fast-gather,-fast-hops,-fast-lzcnt,-fast-movbe,+fast-scalar-fsqrt,-fast-scalar-shift-masks,+fast-shld-rotate,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle,+fast-vector-fsqrt,-fast-vector-shift-masks,+fma,-fma4,+fsgsbase,-fsrm,+fxsr,-gfni,-harden-sls-ijmp,-harden-sls-ret,-hreset,-idivl-to-divb,+idivq-to-divl,+invpcid,-kl,-lea-sp,-lea-uses-ag,-lvi-cfi,-lvi-load-hardening,-lwp,+lzcnt,+macrofusion,+mmx,+movbe,-movdir64b,-movdiri,-mwaitx,+nopl,-pad-short-functions,+pclmul,-pconfig,-pku,+popcnt,-prefer-128-bit,-prefer-256-bit,-prefer-mask-registers,-prefetchwt1,+prfchw,-ptwrite,-rdpid,-rdpru,+rdrnd,+rdseed,-retpoline,-retpoline-external-thunk,-retpoline-indirect-branches,-retpoline-indirect-calls,-rtm,+sahf,-sbb-dep-breaking,-serialize,-seses,+sgx,-sha,-shstk,+slow-3ops-lea,-slow-incdec,-slow-lea,-slow-pmaddwd,-slow-pmulld,-slow-shld,-slow-two-mem-ops,-slow-unaligned-mem-16,-slow-unaligned-mem-32,-soft-float,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,-sse-unaligned-mem,+ssse3,-tagged-globals,-tbm,-tsxldtrk,-uintr,-use-glm-div-sqrt-costs,-use-slm-arith-costs,-vaes,-vpclmulqdq,+vzeroupper,-waitpkg,-wbnoinvd,-widekl,+x87,-xop,+xsave,+xsavec,+xsaveopt,+xsaves" }
@andrewrk andrewrk added bug Indicates an unexpected problem or unintended behavior backend:X86 regression labels Aug 4, 2022
@andrewrk andrewrk added this to the LLVM 15.0.0 Release milestone Aug 4, 2022
@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2022

@llvm/issue-subscribers-bug

@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2022

@llvm/issue-subscribers-backend-x86

@tstellar
Copy link
Collaborator

tstellar commented Aug 4, 2022

@phoebewang

@tstellar tstellar moved this to Needs Triage in LLVM Release Status Aug 4, 2022
@tstellar tstellar moved this from Needs Triage to Needs Fix in LLVM Release Status Aug 4, 2022
@mstorsjo
Copy link
Member

mstorsjo commented Aug 4, 2022

I tried to reproduce the issue, but if I run the attached LLVM IR through llc (either just llc zig.ll -filetype=obj -o zig.o or with -O2 or -O1 added), and link it to an executable (gcc zig.o -o zig) it returns 0 when I execute it. (This is with llc built from the commit you referenced.) If I compile it with llc -O0, linking fails with __extendhftf2 being undefined.

@nikic
Copy link
Contributor

nikic commented Aug 4, 2022

I believe the ABI of __extendhftf2 has changed in LLVM 15 (see also #56854). If you provide your own implementation of the builtins, you might want to check that you're using the correct ABI.

@phoebewang
Copy link
Contributor

phoebewang commented Aug 4, 2022

Thanks all! I guess it is the reason @nikic pointed. @mstorsjo when compile with -O1 or -O2, compiler will optimize out the conversion. You can check it with GCC 12.0 which provides __extendhftf2. It returns 0 in my local.

@andrewrk
Copy link
Member Author

andrewrk commented Aug 4, 2022

Thank you for the clues all. Fixed downstream in ziglang/zig@169ad1a.

@andrewrk andrewrk closed this as completed Aug 4, 2022
@nikic nikic moved this from Needs Fix to Done in LLVM Release Status Aug 4, 2022
@EugeneZelenko EugeneZelenko added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! and removed bug Indicates an unexpected problem or unintended behavior regression labels Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
Archived in project
Development

No branches or pull requests

7 participants