-
Notifications
You must be signed in to change notification settings - Fork 520
Updated Streams to override Memory and Span overloads #2333
Conversation
- Also plumbed Memory/Span through Kestrel over ArraySegment. - Throw synchronously from the HttpRequestStream instead of async in some cases.
{ | ||
foreach (var memory in readableBuffer) | ||
{ | ||
// REVIEW: This *could* be slower if 2 things are true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a breaking change that I don't mind reverting. We're calling a different version of WriteAsync here. On .NET Core 2.1 it will delegate to the correct WriteAsync call but I haven't through through all of the cases.
@stephentoub Do you have any thoughts about the fact that CopyToAsync has to basically decide what overload of WriteAsync to call on the other Stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this foreach
be extension for ReadOnlySequence<byte>
?
Task WriteAsync(this Stream, in ReadOnlySequence<byte>, CancellationToken)
Task WriteAsync(this PipeWriter, in ReadOnlySequence<byte>, CancellationToken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any thoughts about the fact that CopyToAsync has to basically decide what overload of WriteAsync to call on the other Stream?
For the most part we've decided it's ok to start calling the {ReadOnly}Memory<byte>
overloads and have switched to doing so in a variety of places where it provides additional value, e.g. where we can better respect pinning, where we can avoid a task allocation for a synchronously completing ReadAsync, etc. In general all of the reading methods on Stream are meant to provide the same core semantics, and similarly for the writing methods, so in theory there shouldn't be a real behavioral difference when we switch from calling one to calling the other. And as you mention the base virtual implementation on Stream does the right thing.
There is one case where we've taken extra precautions to support this. Consider a library with a type that derives from a Stream-derived type, like MemoryStream, for example a LimitMemoryStream that sets a bound on how much can be written to the Stream. It does this by overriding all of the virtual methods on the type. Now we introduce a new virtual method on Stream and override that on MemoryStream, but this derived LimitMemoryStream doesn't yet override it (and maybe can't, if it's defined in a netstandard20 implementation). When we then switch to calling this new method on MemoryStream, we potentially bypass the extra functionality that was provided by the derived type. As such, in cases like this where there's a notable concern, we've added a type check to validate that this
is in fact a concrete MemoryStream rather than a derived type, and if it is derived, we instead delegate to the base implementation that'll defer to the existing virtual method, e.g.
https://github.com/dotnet/coreclr/blob/9ff75d7600aa59dcec3edc16332794cc7007590b/src/mscorlib/shared/System/IO/MemoryStream.cs#L686-L693
This has obvious downsides that such a derived type doesn't benefit from any perf benefits that overload provides, but it limits the compatibility concern.
We've also gone through and overridden the new Memory<byte>
-based overloads on all of the relevant streams in coreclr/corefx. Of course if we've missed any of note, please open an issue so it can be corrected asap :)
Woah, just saw this:
No reason... Re-running |
await stream.WriteAsync(buffer.First); | ||
#else | ||
var array = buffer.First.GetArray(); | ||
await stream.WriteAsync(array.Array, array.Offset, array.Count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we know ReadOnlyMemory<byte>.GetArray()
succeeds, what's the benefit of calling Stream.WriteAsync(ReadOnlyMemory<byte>)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Stream implementation we're calling into doesn't call TryGetArray and wants to pin the memory then it's free here (avoids the GCHandle overhead). It also avoids the conversion from Memory<T> -> T[] (in theory) in a bunch of places. So for example, when writing through SSL Stream we can avoid the extra pinning required to call into SChannel and OpenSSL.
Have any benchmarks? |
Schannel actually pays for a second bounds check on array input. Eg internally it just uses memory for everything (not array) and so if you pass in array it does
and all the other bounds checks on arrays, then it just makes a Memory out of it that does the bounds checks in the constructor again. If you pass in a memory all initial bounds checks are gone because well its a valid memory ... that is just one place I know you will save some time. |
No I haven’t done any benchmarking on this branch. What exactly are you looking for? Just want to know if we regress perf? |
I ran a plaintext on our perf hardware and there was no throughput difference (it was noise). This is what I expect but I'm also looking at fixing the --source flag. It's broken today. Let me know if there's anything else before I merge. @pakrym will need to rebase 😄 |
If we verify an SSL benchmark doesn't regress |
It actually looks faster (I ran 3 times each), this was one of the runs. Before
After
Safe to say nothing has regressed 😄 (famous last words)... |
What does Max CPU of 99% vs 530% mean? |
@sebastienros ? I believe it's to do with the number of cores on the machine. |
I think the 530% is fine. I suspect the question was more why is it 98% in one test vs 530%. I would hazard a guess and say due to it being a sample and probably a long sampling time and a short test it becomes basically a random number unless processor use is locked for a long period. |
Sorry I thought it was from another PR ... In this case we do process the CPU manually based on this code: https://github.com/aspnet/benchmarks/blob/dev/src/BenchmarksServer/Startup.cs#L348-L352 Maybe the delay between a measure was too small during the whole benchmark. We do take the max value of all the samples during the test, I'll look into it. |
#2192