-
Notifications
You must be signed in to change notification settings - Fork 43
i64x2.mul: useful or not? #88
Comments
i8.mul, i16.mul, i32.mul, i64.mul, i32x4.mul, i16x8.mul and i8x16.mul all do require a larger type to hold the result and are all therefore defined to return the lower half of the result. I don't understand why these semantics are, in your opinion, problematic for i64x2.mul but not for any of the other mul opcodes. |
That's very true about i8x16.mul and the rest, i completely missed those because i was looking at arm64 mul and they didnt have it defined for i64x2 :) |
I think this issue is more polling whether this instruction is practically useful in applications. We've previously had this conversation for i8x16Mul as well, whether the resulting overflow handling is worth it (in #28). Asking whether there are practical applications for this behavior seems like a reasonable question. |
If you multiply two numbers that are too large, the result won't fit in a register of the same size. That's an issue that all integer multiplies have. SIMD multiplies aren't special, they have the same issue that the scalar multiplies have.
When users do care about the top half of the result, they just switch to a wider integer type that produces a wider result.
Yes that's exactly how you do it - for scalars, you would just convert two i8s into two i16s and then do a single i16.mul. Some architectures do have instructions for multiplying two integers and storing a result that's twice as wide, e.g., the x86 BMI2 ISA has a
I read this as whether integer multiplies that lose the upper bits are generally useful in practice. In the discussion we had about i8 multiplies the main argument was that with multiplies an i8 is ridiculously easy to overflow (but then again this argument applies to i8.mul as well). I don't think this argument holds for i64. |
Yes, that is the question. I've come across them in benchmarks, but don't have a good sense for how widely they are used in practice.
It's not the same argument, but I was holding this as a precedent of them not always being used in practice, not necessarily for the same reasons. |
It is unclear whether you are talking about integer multiplies that discard the upper bits or about multiplies that return both the lower and upper bits of the result. |
We have been prioritizing use cases and possibility of efficient implementations as key criteria for defining instructions & their semantics. In case of multiplies (scalar or simd) there are obvious workarounds when the higher order bits are necessary for the applications, so a change will make sense only if we have convincing use cases that can gain realistic benefits. I tried unsuccessfully to find some real use cases. Think this is the concern @dtig mentioned above. ie. justification to add multiplies returning lower and upper bits for > 8bit lanes. It will be great if anyone can point to any workloads that will help us here. As WASM scalar offers only i32.mul and i64.mul only. Going back to @ngzhian's question, the argument to have i64x2.mul is arguably identical to i64.mul IMO. |
Yes integer multiplication while keeping only the low half is useful. Scientific computingI use those to implement fast integer matrix multiplication. Image processingImages are stored as integers, lots of image processing (like blurring, edge detection, ...) involves convolution or cross-correlation which requires multiplication. CryptographyTo implement portable big int matrix multiplication, the easiest is to use Karatsuba algorithm and after all multiplications only the low-half is kept. |
Thanks for your reply, closing this issue as the question has been answered. |
I was prototyping some i64x2 instructions, and was just going down the list of existing i32x4 instructions, and encountered mul.
Multiplying 2 64-bit numbers requires 128-bits to hold the result, so a i64x2.mul truncates the result to fit inside 64x2. I can't think of interesting use cases for this behavior. Thus, polling via this issue to see if an i64x2 even makes sense.
The text was updated successfully, but these errors were encountered: