use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector #16387

e4m2 · 2023-07-12T16:05:12Z

Zig Version

0.11.0-dev.3978+711b4e93e

Steps to Reproduce and Observed Behavior

Create a.zig, importing and using an LLVM x86 intrinsic. The choice is arbitrary, many (presumably any?) other ones work:

const V16U16 = @Vector(16, u16);
extern fn @"llvm.x86.avx2.phsub.w"(V16U16, V16U16) V16U16;

export fn f(x: V16U16, y: V16U16) V16U16 {
    return @"llvm.x86.avx2.phsub.w"(x, y);
}

zig build-lib a.zig -O ReleaseSmall -mcpu=haswell. ReleaseSmall is again arbitrary and only used to get nicer output. The CPU only matters insofar that the specific target feature is included, AVX2 in this case.
Output: LLVM Emit Object... LLVM ERROR: Cannot select: intrinsic %llvm.x86.avx2.phsub.w. No binary is produced.

Adding --verbose-llvm-(ir|bc)=a.(ll|bc) to the build command still fails with the same LLVM error, but produces a valid IR file that can be compiled with llc no problem. In the IR file the correct target feature is set as well, so it seems the problem is somewhere on the Zig side. Maybe Zig is failing to pass the (correct) target features somewhere?

Expected Behavior

Should compile using the LLVM backend with an appropriate target CPU.

I am aware this is #2291 and one day won't work on purpose. But there is no resolution to that issue yet, so this should work in the meantime IMO - if not, feel free to close this issue (and to be fair, inline assembly gets you some of the way there, but it leaves a lot to be desired in terms of codegen in my experience).

The text was updated successfully, but these errors were encountered:

andrewrk · 2023-07-22T21:00:28Z

Thanks for the report. As you suspected I'm going to close it as a duplicate of #2291.

However, let's see if we can get your use case solved. What does this instruction do? Surely this is not merely vector subtraction?

e4m2 · 2023-07-23T07:46:37Z

The use case isn't this intrinsic specifically, I actually picked a random one from Intel's Intrinsics Guide just to showcase the more general issue, apologies if this wasn't clear from the original issue description.

The general reproducer seems to be: any intrinsic requiring an additional CPU feature. It won't get selected correctly, regardless of the global CPU feature set. Two things follow from this that may or may not be relevant/interesting:

For example, llvm.x86.rdtsc(p) or llvm.x86.sse2.lfence work fine. For the latter presumably because every x86-64 CPU has SSE2, anything more "adventurous" will fail again.
Compiling the verbose bitcode/IR separately without having the required target features will fail, producing the same error that Zig fails with regardless of what is present in the CPU feature set (again, almost as if Zig is forgetting to set the target features correctly somehow, but this is pure uneducated guesswork on my part, since I have no knowledge of Zig internals to go off of).

As for my specific use case:
I had been somewhat dissatisfied with the status quo codegen of AES using inline assembly and wanted to see if the intrinsics could help (here they are for completeness). Obviously that can't really be tested right now, so it is possible I'm wrong.

If a choice is to be made between some general WONTFIX LLVM backend issue and adding builtins willy-nilly on a case by case basis, then I would much rather concede my specific use case for now in favor of some more general, systematic effort to bring such functionality to Zig in the future.

e4m2 · 2024-03-09T10:45:33Z

Whatever the issue was here, I can't reproduce it anymore on 0.12.0-dev.3182+f3227598e.

Remaining issues:

Compiling with the wrong CPU features runs into some weird errors (e.g. LLVM ERROR: Do not know how to split the result of this operator!) that don't really tell you what the actual issue is.
There isn't an extern calling convention to accurately convey intrinsics, so you end up having to work around it using other callconvs and ~~hoping~~ checking that you get the right declaration in LLVM.

That being said, I'm going to open additional issues for these. It's not worth it to fix these just to make another "bug" work slightly better.

e4m2 added the bug Observed behavior contradicts documented or intended behavior label Jul 12, 2023

andrewrk added backend-llvm The LLVM backend outputs an LLVM IR Module. and removed bug Observed behavior contradicts documented or intended behavior labels Jul 22, 2023

andrewrk added use case Describes a real use case that is difficult or impossible, but does not propose a solution. and removed backend-llvm The LLVM backend outputs an LLVM IR Module. labels Jul 22, 2023

andrewrk added this to the 0.12.0 milestone Jul 22, 2023

andrewrk changed the title ~~LLVM intrinsic instruction selection fails~~ use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector Jul 22, 2023

e4m2 closed this as completed Mar 9, 2024

Vexu modified the milestones: 0.13.0, 0.12.0 Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector #16387

use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector #16387

e4m2 commented Jul 12, 2023

andrewrk commented Jul 22, 2023

Uh oh!

e4m2 commented Jul 23, 2023

Uh oh!

e4m2 commented Mar 9, 2024

Uh oh!

Uh oh!

use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector #16387

use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector #16387

Comments

e4m2 commented Jul 12, 2023

Zig Version

Steps to Reproduce and Observed Behavior

Expected Behavior

andrewrk commented Jul 22, 2023

Uh oh!

e4m2 commented Jul 23, 2023

Uh oh!

e4m2 commented Mar 9, 2024

Uh oh!