Skip to content

Conversation

davidfowl
Copy link
Member

@davidfowl davidfowl commented Mar 13, 2021

  • SslStream was holding onto a 4K byte[] after the handshake was complete. This was because the ArrayBuffer struct doesn't clear the local buffer field in dispose. This changes that.

Before:
image

After:
image

Blocked on #49945

@ghost
Copy link

ghost commented Mar 13, 2021

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Mar 13, 2021

Tagging subscribers to this area: @dotnet/ncl, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details
  • SslStream was holding onto a 4K byte[] after the handshake was complete. This was because the ArrayBuffer struct doesn't clear the local buffer field in dispose. This changes that.

Before:
image

After:
image

Author: davidfowl
Assignees: -
Labels:

area-System.Net.Security

Milestone: -

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidfowl
Copy link
Member Author

Are these tests generally broken?

@stephentoub
Copy link
Member

stephentoub commented Mar 13, 2021

The mono ones are #49569.

The coreclr osx ones I haven't seen. Can you open an issue for that?

CI has been very unhappy the last few days. I suspect as a result things were merged that shouldn't have been, compounding the problem.

@davidfowl
Copy link
Member Author

Looks like a timezone thing.

image

@davidfowl
Copy link
Member Author

this fix actually broke a test:

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Net.ArrayBuffer.get_AvailableLength() in /_/src/libraries/Common/src/System/Net/ArrayBuffer.cs:line 58
   at System.Net.ArrayBuffer.EnsureAvailableSpace(Int32 byteCount) in /_/src/libraries/Common/src/System/Net/ArrayBuffer.cs:line 86
   at System.Net.SafeDeleteSslContext.WriteToConnection(Void* connection, Byte* data, Void** dataLength) in /_/src/libraries/System.Net.Security/src/System/Net/Security/Pal.OSX/SafeDeleteSslContext.cs:line 164
./RunTests.sh: line 162: 23722 Abort trap: 6           (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Security.Tests.runtimeconfig.json --depsfile System.Net.Security.Tests.deps.json xunit.console.dll System.Net.Security.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/p

@davidfowl
Copy link
Member Author

this fix actually broke a test: Uncovered a user after free bug on OSX.

Looks like something is calling WriteToConnection after Dispose @wfurt ?

@wfurt
Copy link
Member

wfurt commented Mar 13, 2021

This is similar to #43686. We really should throw ObjectDisposed exception in this case. HttpClient with H2 is particularly prune to write after Dispose. I'll check if I can reproduce it locally with your change @davidfowl. I could not reproduce #43686 so it got eventually closed without resolution.

I still feel the change make sense. We will just need to improve error checking inside SslStream. (or hold off like we do for the _nestedRead)

@davidfowl
Copy link
Member Author

Yes, the change does make sense. I was looking into the SSL APIs on OSX but I think that should be fixed separately. I'm gonna merge this, can you open an issue the write after dispose?

@stephentoub
Copy link
Member

stephentoub commented Mar 13, 2021

I'm gonna merge this

If merging this is going to tank CI further, please don't.

@davidfowl
Copy link
Member Author

OK I pushed a new change that tries to keep the old behavior (writing into the ArrayBuffer after dispose), which should allow some slack for fixing the real use after free issue. We can revert that part of the change when we figure out why its happening on OSX.

@stephentoub
Copy link
Member

This change isn't critical to get in asap. Let's just prioritize finding and fixing the problem and hold on this until we do.

@davidfowl
Copy link
Member Author

I'm all for fixing the bug I don't want to hold this change hostage though. While it's not critical to get in ASAP, my only fear is that the bug will get punted and also the fix. At a bare minimum, if the fix for the real issue isn't trivial or the fix is hard verify (because reproducing is hard), we should take this improvement but I'll let @wfurt decide on what's best.

Good news is that there's a dump captured as part of the crash...

@stephentoub
Copy link
Member

stephentoub commented Mar 13, 2021

I don't want to hold this change hostage though

This change is causing a crash. While it should be valid, because of another issue it's not. We should fix the other issue before trying to push this in. And I do not want to complicate the code trying to mask the other issue. As I said, let's prioritize finding and fixing the other one, then this change will be valid and the original simple change can be merged.

@davidfowl
Copy link
Member Author

We generally agree, I just want to make sure we have some trip wires and backup plans in place (the other similar issue did get closed). But I'll be optimistic 😄

@wfurt
Copy link
Member

wfurt commented Mar 14, 2021

I think the real issue lives macOS PAL. I can reproduce it locally and I'll take look next week.
I'm not sure we should pollute ArrayBuffer to compensate that. I liked the first simple change better.
cc: @geoffkizer as original author of ArrayBuffer

@davidfowl
Copy link
Member Author

Let me know if you want me to revert to the original change or if you want to take over this change in general and pair it with the OSX fix

@jkotas
Copy link
Member

jkotas commented Mar 14, 2021

The workaround in the ArrayBuffer does not look 100% reliable due to race conditions. It will just make the crash to show up less often. Does it sound right? If it is the case, we should definitely wait for the proper fix.

@davidfowl
Copy link
Member Author

I'm not defending the workaround for the crash, it's as unreliable as it was before. The ArrayBuffer isn't thread safe without or without this change.

@geoffkizer
Copy link
Contributor

As others have said, let's not merge a temporary work-around here. Let's fix the issue in thee OSX PAL or wherever and then merge the proper fix.

my only fear is that the bug will get punted and also the fix.

You've identified a nontrivial perf issue here, and the fix is straightforward. The odds of this getting punted are very low.

@davidfowl
Copy link
Member Author

Sounds fine to me

@geoffkizer
Copy link
Contributor

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidfowl davidfowl removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 29, 2021
@davidfowl
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@davidfowl
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidfowl
Copy link
Member Author

/azp run all

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@davidfowl
Copy link
Member Author

/azp list

@davidfowl
Copy link
Member Author

/azp run runtime-dev-innerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidfowl
Copy link
Member Author

/azp run runtime-staging

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidfowl
Copy link
Member Author

How do I re-run these mono tests 😄

@davidfowl
Copy link
Member Author

/azp run runtime-libraries-mono outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidfowl davidfowl closed this Mar 29, 2021
@davidfowl davidfowl reopened this Mar 29, 2021
@davidfowl davidfowl closed this Mar 29, 2021
@davidfowl davidfowl reopened this Mar 29, 2021
- SslStream was holding onto a 4K byte[] after the handshake was complete. This was because the ArrayBuffer struct doesn't clear the local buffer field in dispose. This changes that.
@davidfowl davidfowl force-pushed the davidfowl/handshake-buffer branch from 2ce6a3c to c1a7d45 Compare March 29, 2021 01:17
@davidfowl
Copy link
Member Author

Force push never fails...

@davidfowl davidfowl merged commit 102d1e8 into main Mar 29, 2021
@davidfowl davidfowl deleted the davidfowl/handshake-buffer branch March 29, 2021 14:16
@davidfowl davidfowl added this to the 6.0.0 milestone Apr 2, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants