You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
#118 was started in response to another issue concerning performance of loading deinterleaved data. The code in question is a port of some OpenCV SSE code to WASM SIMD: they are loading 16 interleaved RGB pixels, and deinterleaving it to 16 Rs, 16Gs, and 16Bs.
This is an instance of a common pattern, and the way to do it optimally is very ISA-specific: the OpenCV code they were porting has not just SSE 2 and Neon variants, but also SSSE 3 and SSE 4.1 variants! The corresponding Neon code maps directly to Neon's 3-way interleaved load instruction: ld3.
Given that this kind of interleaved load is very common, and the optimal code for it is very ISA and ISA-extension specific, I think we should consider adding instructions to WASM SIMD to allow runtimes to generate code that is optimized for the specific target that the code is running on.
These instructions are direct translations of the ARM Neon interleaved load instructions:
AxB.load_interleaved_C loads B*C interleaved A elements from contiguous memory at the given address, and deinterleaves them into C AxB vectors. Pseudo-code:
template<typename A, int B, int C>
void load_interleaved(const A mem[B*C], A result[C][B]) {
for(int i = 0; i < B; ++i) {
for(int j = 0; j < C; ++j) {
result[j][i] = mem[i * C + j];
}
}
}
The complementary store_interleaved instructions are probably worthwhile as well, but I'd like to see what folks think of the load instructions first.
The text was updated successfully, but these errors were encountered:
Interesting idea! These instructions all have multivalue types, so this has a soft dependency on the multivalue proposal. I don't think we want MVP SIMD to depend on multivalue, so I'm going to tag this as post-MVP.
These instructions all have multivalue types, so this has a soft dependency on the multivalue proposal. I don't think we want MVP SIMD to depend on multivalue, so I'm going to tag this as post-MVP.
I agree that we don't want to add a dependency on multivalue. However, I think it would be pretty easy to add instructions with multiple results to the SIMD spec/reference interpreter without pulling in anything from the multivalue repo.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
#118 was started in response to another issue concerning performance of loading deinterleaved data. The code in question is a port of some OpenCV SSE code to WASM SIMD: they are loading 16 interleaved RGB pixels, and deinterleaving it to 16 Rs, 16Gs, and 16Bs.
This is an instance of a common pattern, and the way to do it optimally is very ISA-specific: the OpenCV code they were porting has not just SSE 2 and Neon variants, but also SSSE 3 and SSE 4.1 variants! The corresponding Neon code maps directly to Neon's 3-way interleaved load instruction:
ld3
.Given that this kind of interleaved load is very common, and the optimal code for it is very ISA and ISA-extension specific, I think we should consider adding instructions to WASM SIMD to allow runtimes to generate code that is optimized for the specific target that the code is running on.
These instructions are direct translations of the ARM Neon interleaved load instructions:
AxB.load_interleaved_C
loads B*C interleavedA
elements from contiguous memory at the given address, and deinterleaves them into CAxB
vectors. Pseudo-code:The complementary
store_interleaved
instructions are probably worthwhile as well, but I'd like to see what folks think of the load instructions first.The text was updated successfully, but these errors were encountered: