-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
replace @mulWithOverflow(u64, x, radix, &x) with @mul(u64, a, b) that returns a double width type #1350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please elaborate:
I'll re-open when you provide these things. |
Sorry I was on my phone |
Thanks! No worries, just wanted to throw the ball back in your court |
can you add the optimization tag, because this would provide access to these instructions. |
I also added upstream, because arguably this should exist in LLVM. https://llvm.org/docs/LangRef.html#llvm-smul-with-overflow-intrinsics |
the sparc origins of llvm is showing. reported upstream: https://bugs.llvm.org/show_bug.cgi?id=38475 |
If you look at the upstream hug it looks like we do not need upstream work because value range propagation uses the 32*32->64 instructions. |
I see, thanks for the follow-up. Here is my counter proposal:
No possibility of failure, no pointers, it just has a double-wide return type. |
I think this may be better as a userland function. The language intrinsic should probably stay mapped directly to the LLVM intrinsic. in std.math: fn wideMul(comptime T: type, x: T, y: T) @IntType(T.is_signed, T.bit_count * 2) {
const ResultInt = @IntType(T.is_signed, T.bit_count * 2);
return ResultInt(x) * ResultInt(y);
} This is what Zig would generate anyway for this intrinsic, so making it a compiler primitive would not be an optimization. Feel free to make additional arguments for this intrinsic. |
`@mul(comptime T: type, a: T, b: T) @inttype(t.is_signed, T.bit_count * 2)
some architectures, such as mips[1] and sh4[2] do not have overflowing multiplication. Instead they return a multiplication result that is twice as big as the input types. This is how multiplication is done in hardware, the result. For 128X128-,>256 we will need a new u256 type and upstream work (also for 256X256->512[3] and 512X512->1024[4])
[1] https://stackoverflow.com/questions/16050338/mips-integer-multiplication-and-division
[2] http://www.shared-ptr.com/sh_insns.
[3] https://stackoverflow.com/questions/34234407/is-there-hardware-support-for-128bit-integers-in-modern-processors/50776753#50776753 The x86/x64, however, is a superscalar CPU, and the registers you know of are merely the architectural registers. Behind the scenes, there are a lot more registers that help optimize the CPU pipeline to perform out of order instructions using multiple ALUs. While the x64 may not be a 128-bit CPU, SSE/SSE2 introduced native 128-bit math, AVX introduced 256-bit native integer math, and AVX2 introduced 512-bit integer math. When returning from functions you will return the value in the 128-bit XMM0 SSE/SSE2 register, 256-bit AVX results in YMM0, and 512-bit AVX2 results in ZMM0; these, however, are add-ons to the x86/x64, not the primary architecture and support is entirely compiler and release platform (such as Python) dependent.
[4] https://en.wikipedia.org/wiki/AVX-512
The text was updated successfully, but these errors were encountered: