-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Fix incorrect SIMD temp allocation for Vector256 with AVX2 disabled #58820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The NI_Vector256_GetElement intrinsic, in some situations, requires a stack temporary. With AVX2 disabled, this temporary was getting allocated as a TYP_SIMD16 instead of a TYP_SIMD32, leading to overwriting the local variable. Add a type argument to the temp variable allocation, and allocate the temp as the largest sized type required by any use. Fixes dotnet#58295
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThe NI_Vector256_GetElement intrinsic, in some situations, requires Add a type argument to the temp variable allocation, and allocate the Fixes #58295
|
Windows x64 diffs:
Detail diffs
Linux x64:
Detail diffs
These diffs are due to allocating a SIMD16 instead of a SIMD32, so the frame shrinks. We don't have any SPMI collections where AVX2 is disabled. |
@tannergooding @dotnet/jit-contrib PTAL @tannergooding We could alternately go with your fix proposed in #58295 (comment), although this one has the minor benefit of shrinking the temp when the larger temp is not needed. Thoughts? |
I think this fix is fine. I thought we were using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@tannergooding With your suggested change (included here), there are a few SPMI arm64 asm diffs similar to the x64 ones noted above: TYP_SIMD16 temps shrunk to TYP_SIMD8, thus shrinking frame sizes. coreclr_tests.pmi.windows.arm64.checked.mch:
Detail diffs
libraries.crossgen2.windows.arm64.checked.mch:
Detail diffs
|
Should this also add an outerloop leg that will catch this kind of failure in the future? |
I'm testing a PR (#58822) to add JitStress=1 and JitStress=2 to the "runtime-coreclr jitstress-isas-x86" pipeline. We can discuss it over there. |
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1215944929 |
The NI_Vector256_GetElement intrinsic, in some situations, requires
a stack temporary. With AVX2 disabled, this temporary was getting
allocated as a TYP_SIMD16 instead of a TYP_SIMD32, leading to overwriting
the local variable.
Add a type argument to the temp variable allocation, and allocate the
temp as the largest sized type required by any use.
Fixes #58295