Fix lazy machine state unwinding for MSVC epilogues on x86 #45906

hoyosjs · 2020-12-10T20:04:11Z

MSVC introduced changes into their prologue, epilogue helpers. Particularly _EH_prolog3_catch_GS_align/_EH_epilog3_GS_align had a mov esp, ebx instruction the lazy unwinding didn't handle. This helper ended up getting called by 3 FCalls in all coreclr:

DebugDebugger::Log
StubHelpers::ValidateObjec
COMDelegate::BindToMethodName

_EH_epilog3_align has no direct calls for an FCall. All go though the GS variant first.

As for the other reported reliability issue in ThreadNative::GetThreadDeserializationTracker, that one uses another helper _EH_prolog3_catch/_EH_prolog3_catch_GS. I hand checked the helper, and there's nothing I can see the unwinder wouldn't gracefully interpret. It's also unlikely these stubs would cause an issue just given the amount of FCalls they support (roughly half the CLR's FCalls use that variant).

jkotas · 2020-12-10T20:39:46Z

src/coreclr/vm/i386/gmsx86.cpp

Is "MOV ESP, EBP" correct?

No. I'll update the comment when I finish this patch. This still doesn't solve the whole issue, but I got roped into some other issues the past couple of weeks. Thanks for noting this.

jkotas · 2020-12-22T07:46:18Z

I have hit this crash in #46244. I am fixing it by converting the FCall w/ HMF to QCall. Identifying all FCalls with this problem and converting them to QCalls is one potential way to deal with this problem.

janvorli · 2021-02-08T11:42:14Z

@hoyosjs is this change finished or does it require more work?

hoyosjs · 2021-02-08T11:48:04Z

This hadn't fixed the debugger logging issue and i went on vacation. Jan fixed this in master by turning the FCall into a QCal. Did you find another case where this hit?

jkotas · 2021-02-08T11:59:35Z

Yes, there are several other FCalls that have this problem. jkotas@40af820 that I have shared with you earlier is a hack to identify them.

hoyosjs · 2021-02-11T11:39:35Z

I tested this again, and thought it makes sense and the first unwind and a manual walk through the yield the same ebp/esp/return address, somehow later walks end up in a bad walk - which made me wonder if it could have anything to do with tiering. Disabling tiering stops reproducing the issue (even in the absence of the fix, not sure what it changes in terms of places where this could happen). I'm going to port the debugger fix first as this seems to be getting long.

jkotas · 2021-02-11T15:20:56Z

Some manifestations of this bug will be masked when a method higher on the stack saves more non-volatile registers or uses different sequence to save them. It is likely why you see it no longer crash with tiering disabled.

You should check the values of all non-volatile registers after the unwind, not just ebp/esp/return address.

hoyosjs · 2021-02-11T21:15:43Z

The repro I was using on top of 5.0 was going from main directly to the FCall:

using System.Diagnostics;

class Program
{
    public const int NUM_ITEMS = 5_000_000;

    static void Main(string[] args)
    {
        for (int i = 0; i < NUM_ITEMS; i++)
        {
            string a = new string("Hello");
            Debugger.Log(0, null, a);
        }
    }
}

And on the first pass the registers in the lazy state (so edi, esi, ebx) also matched.

hoyosjs · 2021-02-13T01:35:10Z

Actually, this might have been correct all along. Having the native debugger attached with some breakpoint set might have been what tripped the FCALL. I let this run for a while under VS and see no crash, where as before it was in a sub-second manner.

jkotas

Looks good to me. I have double checked it under debugger and I have convinced myself that this works fine.

I think this change is low-risk enough to be ported to servicing instead of the QCall changes.

hoyosjs added this to the 6.0.0 milestone Dec 10, 2020

hoyosjs added the area-VM-coreclr label Dec 10, 2020

hoyosjs self-assigned this Dec 10, 2020

hoyosjs linked an issue Dec 10, 2020 that may be closed by this pull request

A different and seemingly random exception occurs System.AccessViolationException/System.ExecutionEngineException/System.NullReferenceException/System.ExecutionEngineException #44519

Closed

jkotas reviewed Dec 10, 2020

View reviewed changes

jkotas mentioned this pull request Dec 12, 2020

Fix FC_NO_TAILCALL with newer compilers #45999

Merged

jkotas mentioned this pull request Jan 6, 2021

Delete reflection blocking on GetThreadDeserializationTracker #46607

Merged

hoyosjs marked this pull request as ready for review January 6, 2021 01:26

Add intepretation of MOV ESP, EBX to unwindLazyState

c67bf99

hoyosjs force-pushed the juhoyosa/fix-stackwalk-x86 branch from 0e978df to c67bf99 Compare January 6, 2021 02:41

hoyosjs closed this Feb 11, 2021

hoyosjs reopened this Feb 11, 2021

hoyosjs closed this Feb 13, 2021

hoyosjs reopened this Feb 13, 2021

runfoapp bot mentioned this pull request Feb 15, 2021

Test Failure : System.Net.Security.Tests.SslStreamNetworkStreamTest.SslStream_ClientCertificate_SendsChain #48091

Closed

jkotas approved these changes Feb 17, 2021

View reviewed changes

jkotas merged commit 9c5d363 into dotnet:master Feb 17, 2021

jkotas mentioned this pull request Feb 18, 2021

Turn Debugger.Log, Debugger.Launch, and Delegate.BindToMethodName to QCalls #48211

Closed

ghost locked as resolved and limited conversation to collaborators Mar 19, 2021

hoyosjs deleted the juhoyosa/fix-stackwalk-x86 branch September 10, 2021 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix lazy machine state unwinding for MSVC epilogues on x86 #45906

Fix lazy machine state unwinding for MSVC epilogues on x86 #45906

hoyosjs commented Dec 10, 2020 •

edited

Loading

Uh oh!

jkotas Dec 10, 2020

Uh oh!

hoyosjs Dec 10, 2020

Uh oh!

jkotas commented Dec 22, 2020

Uh oh!

janvorli commented Feb 8, 2021

Uh oh!

hoyosjs commented Feb 8, 2021

Uh oh!

jkotas commented Feb 8, 2021 •

edited

Loading

Uh oh!

hoyosjs commented Feb 11, 2021

Uh oh!

jkotas commented Feb 11, 2021

Uh oh!

hoyosjs commented Feb 11, 2021

Uh oh!

hoyosjs commented Feb 13, 2021

Uh oh!

jkotas left a comment

Uh oh!

Uh oh!

Fix lazy machine state unwinding for MSVC epilogues on x86 #45906

Fix lazy machine state unwinding for MSVC epilogues on x86 #45906

Conversation

hoyosjs commented Dec 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas Dec 10, 2020

Choose a reason for hiding this comment

Uh oh!

hoyosjs Dec 10, 2020

Choose a reason for hiding this comment

Uh oh!

jkotas commented Dec 22, 2020

Uh oh!

janvorli commented Feb 8, 2021

Uh oh!

hoyosjs commented Feb 8, 2021

Uh oh!

jkotas commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hoyosjs commented Feb 11, 2021

Uh oh!

jkotas commented Feb 11, 2021

Uh oh!

hoyosjs commented Feb 11, 2021

Uh oh!

hoyosjs commented Feb 13, 2021

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hoyosjs commented Dec 10, 2020 •

edited

Loading

jkotas commented Feb 8, 2021 •

edited

Loading