Skip to content

Conversation

BruceForstall
Copy link
Contributor

The NI_Vector256_GetElement intrinsic, in some situations, requires
a stack temporary. With AVX2 disabled, this temporary was getting
allocated as a TYP_SIMD16 instead of a TYP_SIMD32, leading to overwriting
the local variable.

Add a type argument to the temp variable allocation, and allocate the
temp as the largest sized type required by any use.

Fixes #58295

The NI_Vector256_GetElement intrinsic, in some situations, requires
a stack temporary. With AVX2 disabled, this temporary was getting
allocated as a TYP_SIMD16 instead of a TYP_SIMD32, leading to overwriting
the local variable.

Add a type argument to the temp variable allocation, and allocate the
temp as the largest sized type required by any use.

Fixes dotnet#58295
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 8, 2021
@ghost
Copy link

ghost commented Sep 8, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

The NI_Vector256_GetElement intrinsic, in some situations, requires
a stack temporary. With AVX2 disabled, this temporary was getting
allocated as a TYP_SIMD16 instead of a TYP_SIMD32, leading to overwriting
the local variable.

Add a type argument to the temp variable allocation, and allocate the
temp as the largest sized type required by any use.

Fixes #58295

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Contributor Author

Windows x64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 222
Total bytes of diff: 190
Total bytes of delta: -32 (-14.41% of base)
Total relative delta: -0.29
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
         -16 : 207372.dasm (-14.29% of base)
         -16 : 207371.dasm (-14.55% of base)

2 total files with Code Size differences (2 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -16 (-14.29% of base) : 207372.dasm - GitHub_22815.Program:test128(byte):bool
         -16 (-14.55% of base) : 207371.dasm - GitHub_22815.Program:test128(ubyte):bool

Top method improvements (percentages):
         -16 (-14.55% of base) : 207371.dasm - GitHub_22815.Program:test128(ubyte):bool
         -16 (-14.29% of base) : 207372.dasm - GitHub_22815.Program:test128(byte):bool

2 total methods with Code Size differences (2 improved, 0 regressed), 0 unchanged.


Linux x64:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 978
Total bytes of diff: 931
Total bytes of delta: -47 (-4.81% of base)
Total relative delta: -0.33
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
         -16 : 210097.dasm (-14.29% of base)
         -16 : 210096.dasm (-14.29% of base)
         -10 : 198748.dasm (-2.72% of base)
          -5 : 195198.dasm (-1.30% of base)

4 total files with Code Size differences (4 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -16 (-14.29% of base) : 210097.dasm - GitHub_22815.Program:test128(byte):bool
         -16 (-14.29% of base) : 210096.dasm - GitHub_22815.Program:test128(ubyte):bool
         -10 (-2.72% of base) : 198748.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector128`1[Byte],int):this
          -5 (-1.30% of base) : 195198.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector128`1[Byte],int):this

Top method improvements (percentages):
         -16 (-14.29% of base) : 210097.dasm - GitHub_22815.Program:test128(byte):bool
         -16 (-14.29% of base) : 210096.dasm - GitHub_22815.Program:test128(ubyte):bool
         -10 (-2.72% of base) : 198748.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector128`1[Byte],int):this
          -5 (-1.30% of base) : 195198.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector128`1[Byte],int):this

4 total methods with Code Size differences (4 improved, 0 regressed), 0 unchanged.


These diffs are due to allocating a SIMD16 instead of a SIMD32, so the frame shrinks. We don't have any SPMI collections where AVX2 is disabled.

@BruceForstall
Copy link
Contributor Author

@tannergooding @dotnet/jit-contrib PTAL

@tannergooding We could alternately go with your fix proposed in #58295 (comment), although this one has the minor benefit of shrinking the temp when the larger temp is not needed. Thoughts?

@tannergooding
Copy link
Member

Thoughts?

I think this fix is fine. I thought we were using getSIMDInitTempVarNum in more places and this would be a more complex change to make work like you have.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@BruceForstall
Copy link
Contributor Author

@tannergooding With your suggested change (included here), there are a few SPMI arm64 asm diffs similar to the x64 ones noted above: TYP_SIMD16 temps shrunk to TYP_SIMD8, thus shrinking frame sizes.

coreclr_tests.pmi.windows.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 2184
Total bytes of diff: 2176
Total bytes of delta: -8 (-0.37% of base)
Total relative delta: -0.02
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file regressions (bytes):
           4 : 196147.dasm (0.54% of base)

Top file improvements (bytes):
          -4 : 196012.dasm (-0.56% of base)
          -4 : 196094.dasm (-1.12% of base)
          -4 : 196182.dasm (-1.05% of base)

4 total files with Code Size differences (3 improved, 1 regressed), 0 unchanged.

Top method regressions (bytes):
           4 ( 0.54% of base) : 196147.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this

Top method improvements (bytes):
          -4 (-0.56% of base) : 196012.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this
          -4 (-1.12% of base) : 196094.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this
          -4 (-1.05% of base) : 196182.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this

Top method regressions (percentages):
           4 ( 0.54% of base) : 196147.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this

Top method improvements (percentages):
          -4 (-1.12% of base) : 196094.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this
          -4 (-1.05% of base) : 196182.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this
          -4 (-0.56% of base) : 196012.dasm - HVATests`1[Byte][System.Byte]:checkValues(System.String,System.Runtime.Intrinsics.Vector64`1[Byte],int):this

4 total methods with Code Size differences (3 improved, 1 regressed), 0 unchanged.


libraries.crossgen2.windows.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 1124
Total bytes of diff: 1124
Total bytes of delta: 0 (0.00% of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 3 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 3 unchanged.


@tannergooding
Copy link
Member

Should this also add an outerloop leg that will catch this kind of failure in the future?

@BruceForstall
Copy link
Contributor Author

Should this also add an outerloop leg that will catch this kind of failure in the future?

I'm testing a PR (#58822) to add JitStress=1 and JitStress=2 to the "runtime-coreclr jitstress-isas-x86" pipeline. We can discuss it over there.

@BruceForstall BruceForstall merged commit 73b29b5 into dotnet:main Sep 9, 2021
@BruceForstall
Copy link
Contributor Author

/backport to release/6.0

@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2021

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1215944929

@ghost ghost locked as resolved and limited conversation to collaborators Oct 9, 2021
@BruceForstall BruceForstall deleted the Fix58295 branch December 28, 2022 01:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector256<Byte.WithElement(1235410822): RunBasicScenario failed to throw ArgumentOutOfRangeException.

2 participants