Skip to content

use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector #16387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
e4m2 opened this issue Jul 12, 2023 · 3 comments
Closed
Labels
use case Describes a real use case that is difficult or impossible, but does not propose a solution.
Milestone

Comments

@e4m2
Copy link
Contributor

e4m2 commented Jul 12, 2023

Zig Version

0.11.0-dev.3978+711b4e93e

Steps to Reproduce and Observed Behavior

  1. Create a.zig, importing and using an LLVM x86 intrinsic. The choice is arbitrary, many (presumably any?) other ones work:
const V16U16 = @Vector(16, u16);
extern fn @"llvm.x86.avx2.phsub.w"(V16U16, V16U16) V16U16;

export fn f(x: V16U16, y: V16U16) V16U16 {
    return @"llvm.x86.avx2.phsub.w"(x, y);
}
  1. zig build-lib a.zig -O ReleaseSmall -mcpu=haswell. ReleaseSmall is again arbitrary and only used to get nicer output. The CPU only matters insofar that the specific target feature is included, AVX2 in this case.
  2. Output: LLVM Emit Object... LLVM ERROR: Cannot select: intrinsic %llvm.x86.avx2.phsub.w. No binary is produced.

Adding --verbose-llvm-(ir|bc)=a.(ll|bc) to the build command still fails with the same LLVM error, but produces a valid IR file that can be compiled with llc no problem. In the IR file the correct target feature is set as well, so it seems the problem is somewhere on the Zig side. Maybe Zig is failing to pass the (correct) target features somewhere?

Expected Behavior

Should compile using the LLVM backend with an appropriate target CPU.


I am aware this is #2291 and one day won't work on purpose. But there is no resolution to that issue yet, so this should work in the meantime IMO - if not, feel free to close this issue (and to be fair, inline assembly gets you some of the way there, but it leaves a lot to be desired in terms of codegen in my experience).

@e4m2 e4m2 added the bug Observed behavior contradicts documented or intended behavior label Jul 12, 2023
@andrewrk andrewrk added backend-llvm The LLVM backend outputs an LLVM IR Module. and removed bug Observed behavior contradicts documented or intended behavior labels Jul 22, 2023
@andrewrk
Copy link
Member

Thanks for the report. As you suspected I'm going to close it as a duplicate of #2291.

However, let's see if we can get your use case solved. What does this instruction do? Surely this is not merely vector subtraction?

@andrewrk andrewrk added use case Describes a real use case that is difficult or impossible, but does not propose a solution. and removed backend-llvm The LLVM backend outputs an LLVM IR Module. labels Jul 22, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Jul 22, 2023
@andrewrk andrewrk changed the title LLVM intrinsic instruction selection fails use case: a builtin that does the equivalent of x86.avx2.phsub.w on a SIMD vector Jul 22, 2023
@e4m2
Copy link
Contributor Author

e4m2 commented Jul 23, 2023

The use case isn't this intrinsic specifically, I actually picked a random one from Intel's Intrinsics Guide just to showcase the more general issue, apologies if this wasn't clear from the original issue description.

The general reproducer seems to be: any intrinsic requiring an additional CPU feature. It won't get selected correctly, regardless of the global CPU feature set. Two things follow from this that may or may not be relevant/interesting:

  1. For example, llvm.x86.rdtsc(p) or llvm.x86.sse2.lfence work fine. For the latter presumably because every x86-64 CPU has SSE2, anything more "adventurous" will fail again.
  2. Compiling the verbose bitcode/IR separately without having the required target features will fail, producing the same error that Zig fails with regardless of what is present in the CPU feature set (again, almost as if Zig is forgetting to set the target features correctly somehow, but this is pure uneducated guesswork on my part, since I have no knowledge of Zig internals to go off of).

As for my specific use case:
I had been somewhat dissatisfied with the status quo codegen of AES using inline assembly and wanted to see if the intrinsics could help (here they are for completeness). Obviously that can't really be tested right now, so it is possible I'm wrong.

If a choice is to be made between some general WONTFIX LLVM backend issue and adding builtins willy-nilly on a case by case basis, then I would much rather concede my specific use case for now in favor of some more general, systematic effort to bring such functionality to Zig in the future.

@e4m2
Copy link
Contributor Author

e4m2 commented Mar 9, 2024

Whatever the issue was here, I can't reproduce it anymore on 0.12.0-dev.3182+f3227598e.

Remaining issues:

  • Compiling with the wrong CPU features runs into some weird errors (e.g. LLVM ERROR: Do not know how to split the result of this operator!) that don't really tell you what the actual issue is.
  • There isn't an extern calling convention to accurately convey intrinsics, so you end up having to work around it using other callconvs and hoping checking that you get the right declaration in LLVM.

That being said, I'm going to open additional issues for these. It's not worth it to fix these just to make another "bug" work slightly better.

@e4m2 e4m2 closed this as completed Mar 9, 2024
@Vexu Vexu modified the milestones: 0.13.0, 0.12.0 Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case Describes a real use case that is difficult or impossible, but does not propose a solution.
Projects
None yet
Development

No branches or pull requests

3 participants