Updated Streams to override Memory and Span overloads #2333

davidfowl · 2018-02-18T08:51:42Z

Also plumbed Memory/Span through Kestrel over ArraySegment.
Throw synchronously from the HttpRequestStream instead of async in some cases.

- Also plumbed Memory/Span through Kestrel over ArraySegment. - Throw synchronously from the HttpRequestStream instead of async in some cases.

davidfowl · 2018-02-18T08:55:55Z

src/Kestrel.Core/Internal/Http/MessageBody.cs

                    {
                        foreach (var memory in readableBuffer)
                        {
+                            // REVIEW: This *could* be slower if 2 things are true


This might be a breaking change that I don't mind reverting. We're calling a different version of WriteAsync here. On .NET Core 2.1 it will delegate to the correct WriteAsync call but I haven't through through all of the cases.

@stephentoub Do you have any thoughts about the fact that CopyToAsync has to basically decide what overload of WriteAsync to call on the other Stream?

Could this foreach be extension for ReadOnlySequence<byte>?

Task WriteAsync(this Stream, in ReadOnlySequence<byte>, CancellationToken) Task WriteAsync(this PipeWriter, in ReadOnlySequence<byte>, CancellationToken)

Do you have any thoughts about the fact that CopyToAsync has to basically decide what overload of WriteAsync to call on the other Stream?

For the most part we've decided it's ok to start calling the {ReadOnly}Memory<byte> overloads and have switched to doing so in a variety of places where it provides additional value, e.g. where we can better respect pinning, where we can avoid a task allocation for a synchronously completing ReadAsync, etc. In general all of the reading methods on Stream are meant to provide the same core semantics, and similarly for the writing methods, so in theory there shouldn't be a real behavioral difference when we switch from calling one to calling the other. And as you mention the base virtual implementation on Stream does the right thing.

There is one case where we've taken extra precautions to support this. Consider a library with a type that derives from a Stream-derived type, like MemoryStream, for example a LimitMemoryStream that sets a bound on how much can be written to the Stream. It does this by overriding all of the virtual methods on the type. Now we introduce a new virtual method on Stream and override that on MemoryStream, but this derived LimitMemoryStream doesn't yet override it (and maybe can't, if it's defined in a netstandard20 implementation). When we then switch to calling this new method on MemoryStream, we potentially bypass the extra functionality that was provided by the derived type. As such, in cases like this where there's a notable concern, we've added a type check to validate that this is in fact a concrete MemoryStream rather than a derived type, and if it is derived, we instead delegate to the base implementation that'll defer to the existing virtual method, e.g.
https://github.com/dotnet/coreclr/blob/9ff75d7600aa59dcec3edc16332794cc7007590b/src/mscorlib/shared/System/IO/MemoryStream.cs#L686-L693
This has obvious downsides that such a derived type doesn't benefit from any perf benefits that overload provides, but it limits the compatibility concern.

We've also gone through and overridden the new Memory<byte>-based overloads on all of the relevant streams in coreclr/corefx. Of course if we've missed any of note, please open an issue so it can be corrected asap :)

davidfowl · 2018-02-18T09:12:15Z

Woah, just saw this:

The active test run was aborted. Reason:

No reason... Re-running

halter73 · 2018-02-20T23:16:58Z

src/Kestrel.Core/Adapter/Internal/AdaptedPipeline.cs

+                            await stream.WriteAsync(buffer.First);
+#else
                            var array = buffer.First.GetArray();
                            await stream.WriteAsync(array.Array, array.Offset, array.Count);


Given that we know ReadOnlyMemory<byte>.GetArray() succeeds, what's the benefit of calling Stream.WriteAsync(ReadOnlyMemory<byte>)?

If the Stream implementation we're calling into doesn't call TryGetArray and wants to pin the memory then it's free here (avoids the GCHandle overhead). It also avoids the conversion from Memory<T> -> T[] (in theory) in a bunch of places. So for example, when writing through SSL Stream we can avoid the extra pinning required to call into SChannel and OpenSSL.

halter73 · 2018-02-21T00:09:50Z

Have any benchmarks?

Drawaes · 2018-02-21T00:28:40Z

Schannel actually pays for a second bounds check on array input.

Eg internally it just uses memory for everything (not array) and so if you pass in array it does

if(offset+length > array.length)

and all the other bounds checks on arrays, then it just makes a Memory out of it that does the bounds checks in the constructor again. If you pass in a memory all initial bounds checks are gone because well its a valid memory ... that is just one place I know you will save some time.

davidfowl · 2018-02-21T01:14:41Z

No I haven’t done any benchmarking on this branch. What exactly are you looking for? Just want to know if we regress perf?

davidfowl · 2018-02-21T03:36:55Z

I ran a plaintext on our perf hardware and there was no throughput difference (it was noise). This is what I expect but I'm also looking at fixing the --source flag. It's broken today. Let me know if there's anything else before I merge. @pakrym will need to rebase 😄

halter73 · 2018-02-21T07:09:40Z

If we verify an SSL benchmark doesn't regress

davidfowl · 2018-02-21T08:00:33Z

It actually looks faster (I ran 3 times each), this was one of the runs.

Before

[11:54:57.002] RequestsPerSecond:           604882
[11:54:57.002] Latency on load (ms):        5.9
[11:54:57.002] Max CPU (%):                 99
[11:54:57.002] WorkingSet (MB):             523
[11:54:57.002] Startup Main (ms):           500
[11:54:57.002] First Request (ms):          381
[11:54:57.002] Latency (ms):                0.9
[11:54:57.002] Socket Errors:               0
[11:54:57.002] Bad Responses:               0

After

[11:56:49.809] RequestsPerSecond:           608491
[11:56:49.810] Latency on load (ms):        5.9
[11:56:49.810] Max CPU (%):                 530
[11:56:49.810] WorkingSet (MB):             499
[11:56:49.810] Startup Main (ms):           498
[11:56:49.810] First Request (ms):          387.8
[11:56:49.810] Latency (ms):                0.9
[11:56:49.810] Socket Errors:               0
[11:56:49.810] Bad Responses:               0

Safe to say nothing has regressed 😄 (famous last words)...

stephentoub · 2018-02-21T09:47:02Z

What does Max CPU of 99% vs 530% mean?

davidfowl · 2018-02-21T15:07:55Z

@sebastienros ? I believe it's to do with the number of cores on the machine.

Drawaes · 2018-02-21T15:34:52Z

I think the 530% is fine. I suspect the question was more why is it 98% in one test vs 530%. I would hazard a guess and say due to it being a sample and probably a long sampling time and a short test it becomes basically a random number unless processor use is locked for a long period.

sebastienros · 2018-02-21T18:07:50Z

~~In this case it's Docker providing the CPU in %, and it's 100*Cores. So it's either a glitch in how docker reported it, or a parsing issue.~~ We do divide by the number of cores though to normalize on 100%. The easiest thing to do right now it to do a new measurement to get the real value.

Sorry I thought it was from another PR ... In this case we do process the CPU manually based on this code:

https://github.com/aspnet/benchmarks/blob/dev/src/BenchmarksServer/Startup.cs#L348-L352

Maybe the delay between a measure was too small during the whole benchmark. We do take the max value of all the samples during the test, I'll look into it.

Updated Streams to override Memory and Span overloads

e5a9511

- Also plumbed Memory/Span through Kestrel over ArraySegment. - Throw synchronously from the HttpRequestStream instead of async in some cases.

davidfowl requested review from halter73 and pakrym February 18, 2018 08:51

davidfowl commented Feb 18, 2018

View reviewed changes

halter73 reviewed Feb 20, 2018

View reviewed changes

davidfowl merged commit 3fc69dc into dev Feb 21, 2018

davidfowl deleted the davidfowl/streams-and-memory branch March 1, 2018 07:56

buybackoff mentioned this pull request May 29, 2018

HttpUpgradeStream doesn't have overloads for Memory<byte> #2620

Closed

Updated Streams to override Memory and Span overloads #2333

Updated Streams to override Memory and Span overloads #2333

Uh oh!

Conversation

davidfowl commented Feb 18, 2018

Uh oh!

davidfowl Feb 18, 2018

Choose a reason for hiding this comment

Uh oh!

benaadams Feb 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Feb 18, 2018

Choose a reason for hiding this comment

Uh oh!

davidfowl commented Feb 18, 2018

Uh oh!

halter73 Feb 20, 2018

Choose a reason for hiding this comment

Uh oh!

davidfowl Feb 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halter73 commented Feb 21, 2018

Uh oh!

Drawaes commented Feb 21, 2018

Uh oh!

davidfowl commented Feb 21, 2018

Uh oh!

davidfowl commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

halter73 commented Feb 21, 2018

Uh oh!

davidfowl commented Feb 21, 2018

Before

After

Uh oh!

stephentoub commented Feb 21, 2018

Uh oh!

davidfowl commented Feb 21, 2018

Uh oh!

Drawaes commented Feb 21, 2018

Uh oh!

sebastienros commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

benaadams Feb 18, 2018 •

edited

Loading

davidfowl Feb 20, 2018 •

edited

Loading

davidfowl commented Feb 21, 2018 •

edited

Loading

sebastienros commented Feb 21, 2018 •

edited

Loading