-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add cast when replacing promoted struct with field in lowering #58589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We may need to sign/zero-extend when we do this replacement very late. Normally this cast would be inserted in morph. Fix dotnet#58373
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWe may need to sign/zero-extend when we do this replacement very late. cc @dotnet/jit-contrib @SingleAccretion Fix #58373
|
Only a small handful of diffs: benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
The typical diff is that we insert an extra (usually un-)necessary sign/zero extension. (unnecessary) --- "a/D:\\dev\\dotnet\\spmi\\asm.libraries.pmi.windows.x64.checked.3\\base\\105054.dasm"
+++ "b/D:\\dev\\dotnet\\spmi\\asm.libraries.pmi.windows.x64.checked.3\\diff\\105054.dasm"
@@ -1,36 +1,37 @@
; Assembly listing for method Microsoft.VisualBasic.FileSystem:TAB():Microsoft.VisualBasic.TabInfo
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; Final local variable assignments
;
;* V00 loc0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T00] ( 2, 2 ) short -> rax single-def V00.Column(offs=0x00) P-INDEP "field V00.Column (fldOffset=0x0)"
;
; Lcl frame size = 0
G_M14166_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
;; bbWeight=1 PerfScore 0.00
G_M14166_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
mov eax, -1
- ;; bbWeight=1 PerfScore 0.25
+ movsx rax, ax
+ ;; bbWeight=1 PerfScore 0.50
G_M14166_IG03: ; , epilog, nogc, extend
ret
;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 6, prolog size 0, PerfScore 1.85, instruction count 2, allocated bytes for code 6 (MethodHash=c26dc8a9) for method Microsoft.VisualBasic.FileSystem:TAB():Microsoft.VisualBasic.TabInfo
+; Total bytes of code 10, prolog size 0, PerfScore 2.50, instruction count 3, allocated bytes for code 10 (MethodHash=c26dc8a9) for method Microsoft.VisualBasic.FileSystem:TAB():Microsoft.VisualBasic.TabInfo (necessary) --- "a/D:\\dev\\dotnet\\spmi\\asm.libraries.pmi.windows.x64.checked.3\\base\\192312.dasm"
+++ "b/D:\\dev\\dotnet\\spmi\\asm.libraries.pmi.windows.x64.checked.3\\diff\\192312.dasm"
@@ -1,36 +1,36 @@
; Assembly listing for method System.Formats.Cbor.HalfHelpers:HalfToInt16Bits(System.Half):short
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; Final local variable assignments
;
;* V00 arg0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T00] ( 2, 2 ) ushort -> rcx single-def V00._value(offs=0x00) P-INDEP "field V00._value (fldOffset=0x0)"
;
; Lcl frame size = 0
G_M52880_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
;; bbWeight=1 PerfScore 0.00
G_M52880_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
- mov eax, ecx
+ movzx rax, cx
;; bbWeight=1 PerfScore 0.25
G_M52880_IG03: ; , epilog, nogc, extend
ret
;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 3, prolog size 0, PerfScore 1.55, instruction count 2, allocated bytes for code 3 (MethodHash=57f6316f) for method System.Formats.Cbor.HalfHelpers:HalfToInt16Bits(System.Half):short
+; Total bytes of code 4, prolog size 0, PerfScore 1.65, instruction count 2, allocated bytes for code 4 (MethodHash=57f6316f) for method System.Formats.Cbor.HalfHelpers:HalfToInt16Bits(System.Half):short |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I feared the regressions would be a lot worse, nice to see being wrong on that count. It is rather unfortunate we cannot insert this normalization in the front-end, but it is how it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you mentioned a better fix would be to fix the first transformation that produces incorrect results:
Morphing BB01 of 'Runtime_58373:HalfToInt16Bits(int,int,int,int,int,int,System.Half):short'
fgMorphTree BB01, STMT00000 (before)
[000003] ---XG------- * RETURN int
[000002] *--XG------- \--* IND short
[000001] ------------ \--* ADDR long
[000000] -------N---- \--* LCL_VAR struct<System.Half, 2>(P) V06 arg6
\--* ushort V06._value (offs=0x00) -> V08 tmp1
fgMorphTree BB01, STMT00000 (after)
[000003] ----G+------ * RETURN int
[000000] -----+-N---- \--* LCL_VAR struct<System.Half, 2>(P) V06 arg6
\--* ushort V06._value (offs=0x00) -> V08 tmp1
here we are losing cast from struct<2>
to int
(for structs the upper bytes are not required to be zeroed, so the cast is necessary).
However, if inserting the cast exposes other issues I agree with the current solution, just mark the places that should be changed back in main after backport.
@@ -0,0 +1,29 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please tell me which complus are needed to repro it on x64 windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have made a blunder somewhere, this test does not repro it for me either. It seems on x64 we normalize when we move the arg to the outgoing arg area/into arg register. So we need a little bit more. This exposes it on x64:
using System;
using System.Runtime.CompilerServices;
public unsafe class Runtime_58373
{
public static int Main()
{
short halfValue = HalfToInt16Bits(MakeHalf());
int x = halfValue;
short val2 = HalfToInt16Bits(*(Half*)&x);
return halfValue == val2 ? 100 : -1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Half MakeHalf()
{
return (Half)(-1.0f);
}
[MethodImpl(MethodImplOptions.NoInlining)]
static short HalfToInt16Bits(Half h)
{
return *(short*)&h;
}
}
This one exposes it on x86 without unsafe except for the reinterp:
using System;
using System.Runtime.CompilerServices;
public unsafe class Runtime_58373
{
public static int Main()
{
// Use up a lot of registers
int a = GetVal();
int b = GetVal();
int c = GetVal();
int d = GetVal();
int e = GetVal();
int f = GetVal();
int g = GetVal();
int h = GetVal();
int i = GetVal();
short val1 = HalfToInt16Bits(MakeHalf());
Half half = MakeHalf();
MakeHalf(); // This will spill lower 16 bits of 'half' to memory
short val2 = HalfToInt16Bits(half); // This will pass 32 bits as arg with upper 16 bits undefined
return val1 == val2 ? 100 + a + b + c + d + e + f + g + h + i : -1;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int GetVal()
{
return 0;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Half MakeHalf()
{
return default;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static short HalfToInt16Bits(Half h)
{
return *(short*)&h;
}
}
I will change the test to use these cases.
Right, the reason we don't insert this cast is because FWIW the other fix has more regressions than this one (but still very few). Although it seems nicer. |
Thanks @jakobbotsch. Please backport it to 6.0 once it is fixed. |
@sandreenko Do you think it would be better to fix the missing cast instead and implement the lowering support for |
I think your approach is right because it touches only 1 method in 1 file for backporting, so LGTM.
If I understand correctly to support it in lowering you will need to delete the case/assert for it and runtime/src/coreclr/jit/lower.cpp Lines 3378 to 3386 in 2e93ebb
|
Thanks, that's great to know. Then I will submit a separate PR doing that for .NET 7 after this one. |
Hmm, looks like one of the tests still fails on ARM32, need to look into that before this is merged. |
The problem is the do-not-enregister case is also wrong, it changes it to a ret-typed N001 ( 1, 1) [000000] -------N----- t0 = LCL_FLD int V00 arg0 [+0]
* ushort V00._value (offs=0x00) -> V02 tmp1
/--* t0 int
N002 ( 2, 2) [000003] ----G-------- * RETURN int $100 We hit this path because we only do dependent promotion for parameters on arm32 and we don't replace dependently-promoted locals with their fields. If I understand correctly we should retype the LCL_FLD node to the native return type, not to the type of the ret node, to ensure we load the proper size. |
Retyping it to the type of the ret node is not right when reinterpreting small structs as a type that needs to be normalized.
I added a fix for the not enregistered case (and fixed some outdated comments). No diffs on x64 and the only ARM32 diff is the fixed codegen in |
are you talking about the total diffs from this PR? Do you think it meets 6.0 bar? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for digging into this.
It was the extra diffs, the total ones are a bit bigger due to the cast in the promoted case. I think this meets the bar since we are hitting the miscompilation in our own code in |
cc @dotnet/jit-contrib, I need a MS org approval. |
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1221703276 |
We may need to sign/zero-extend when we do this replacement very late.
Normally this cast would be inserted in morph.
cc @dotnet/jit-contrib @SingleAccretion
Fix #58373