Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Conversation

davidfowl
Copy link
Member

  • Also plumbed Memory/Span through Kestrel over ArraySegment.
  • Throw synchronously from the HttpRequestStream instead of async in some cases.

#2192

- Also plumbed Memory/Span through Kestrel over ArraySegment.
- Throw synchronously from the HttpRequestStream instead of async in some cases.
@davidfowl davidfowl requested review from halter73 and pakrym February 18, 2018 08:51
{
foreach (var memory in readableBuffer)
{
// REVIEW: This *could* be slower if 2 things are true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a breaking change that I don't mind reverting. We're calling a different version of WriteAsync here. On .NET Core 2.1 it will delegate to the correct WriteAsync call but I haven't through through all of the cases.

@stephentoub Do you have any thoughts about the fact that CopyToAsync has to basically decide what overload of WriteAsync to call on the other Stream?

Copy link
Contributor

@benaadams benaadams Feb 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this foreach be extension for ReadOnlySequence<byte>?

Task WriteAsync(this Stream, in ReadOnlySequence<byte>, CancellationToken)
Task WriteAsync(this PipeWriter, in ReadOnlySequence<byte>, CancellationToken)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any thoughts about the fact that CopyToAsync has to basically decide what overload of WriteAsync to call on the other Stream?

For the most part we've decided it's ok to start calling the {ReadOnly}Memory<byte> overloads and have switched to doing so in a variety of places where it provides additional value, e.g. where we can better respect pinning, where we can avoid a task allocation for a synchronously completing ReadAsync, etc. In general all of the reading methods on Stream are meant to provide the same core semantics, and similarly for the writing methods, so in theory there shouldn't be a real behavioral difference when we switch from calling one to calling the other. And as you mention the base virtual implementation on Stream does the right thing.

There is one case where we've taken extra precautions to support this. Consider a library with a type that derives from a Stream-derived type, like MemoryStream, for example a LimitMemoryStream that sets a bound on how much can be written to the Stream. It does this by overriding all of the virtual methods on the type. Now we introduce a new virtual method on Stream and override that on MemoryStream, but this derived LimitMemoryStream doesn't yet override it (and maybe can't, if it's defined in a netstandard20 implementation). When we then switch to calling this new method on MemoryStream, we potentially bypass the extra functionality that was provided by the derived type. As such, in cases like this where there's a notable concern, we've added a type check to validate that this is in fact a concrete MemoryStream rather than a derived type, and if it is derived, we instead delegate to the base implementation that'll defer to the existing virtual method, e.g.
https://github.com/dotnet/coreclr/blob/9ff75d7600aa59dcec3edc16332794cc7007590b/src/mscorlib/shared/System/IO/MemoryStream.cs#L686-L693
This has obvious downsides that such a derived type doesn't benefit from any perf benefits that overload provides, but it limits the compatibility concern.

We've also gone through and overridden the new Memory<byte>-based overloads on all of the relevant streams in coreclr/corefx. Of course if we've missed any of note, please open an issue so it can be corrected asap :)

@davidfowl
Copy link
Member Author

Woah, just saw this:

The active test run was aborted. Reason:

No reason... Re-running

await stream.WriteAsync(buffer.First);
#else
var array = buffer.First.GetArray();
await stream.WriteAsync(array.Array, array.Offset, array.Count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we know ReadOnlyMemory<byte>.GetArray() succeeds, what's the benefit of calling Stream.WriteAsync(ReadOnlyMemory<byte>)?

Copy link
Member Author

@davidfowl davidfowl Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Stream implementation we're calling into doesn't call TryGetArray and wants to pin the memory then it's free here (avoids the GCHandle overhead). It also avoids the conversion from Memory<T> -> T[] (in theory) in a bunch of places. So for example, when writing through SSL Stream we can avoid the extra pinning required to call into SChannel and OpenSSL.

@halter73
Copy link
Member

Have any benchmarks?

@Drawaes
Copy link
Contributor

Drawaes commented Feb 21, 2018

Schannel actually pays for a second bounds check on array input.

Eg internally it just uses memory for everything (not array) and so if you pass in array it does

if(offset+length > array.length)

and all the other bounds checks on arrays, then it just makes a Memory out of it that does the bounds checks in the constructor again. If you pass in a memory all initial bounds checks are gone because well its a valid memory ... that is just one place I know you will save some time.

@davidfowl
Copy link
Member Author

No I haven’t done any benchmarking on this branch. What exactly are you looking for? Just want to know if we regress perf?

@davidfowl
Copy link
Member Author

davidfowl commented Feb 21, 2018

I ran a plaintext on our perf hardware and there was no throughput difference (it was noise). This is what I expect but I'm also looking at fixing the --source flag. It's broken today. Let me know if there's anything else before I merge. @pakrym will need to rebase 😄

@halter73
Copy link
Member

If we verify an SSL benchmark doesn't regress :shipit:

@davidfowl
Copy link
Member Author

It actually looks faster (I ran 3 times each), this was one of the runs.

Before

[11:54:57.002] RequestsPerSecond:           604882
[11:54:57.002] Latency on load (ms):        5.9
[11:54:57.002] Max CPU (%):                 99
[11:54:57.002] WorkingSet (MB):             523
[11:54:57.002] Startup Main (ms):           500
[11:54:57.002] First Request (ms):          381
[11:54:57.002] Latency (ms):                0.9
[11:54:57.002] Socket Errors:               0
[11:54:57.002] Bad Responses:               0

After

[11:56:49.809] RequestsPerSecond:           608491
[11:56:49.810] Latency on load (ms):        5.9
[11:56:49.810] Max CPU (%):                 530
[11:56:49.810] WorkingSet (MB):             499
[11:56:49.810] Startup Main (ms):           498
[11:56:49.810] First Request (ms):          387.8
[11:56:49.810] Latency (ms):                0.9
[11:56:49.810] Socket Errors:               0
[11:56:49.810] Bad Responses:               0

Safe to say nothing has regressed 😄 (famous last words)...

@davidfowl davidfowl merged commit 3fc69dc into dev Feb 21, 2018
@stephentoub
Copy link
Contributor

What does Max CPU of 99% vs 530% mean?

@davidfowl
Copy link
Member Author

@sebastienros ? I believe it's to do with the number of cores on the machine.

@Drawaes
Copy link
Contributor

Drawaes commented Feb 21, 2018

I think the 530% is fine. I suspect the question was more why is it 98% in one test vs 530%. I would hazard a guess and say due to it being a sample and probably a long sampling time and a short test it becomes basically a random number unless processor use is locked for a long period.

@sebastienros
Copy link
Member

sebastienros commented Feb 21, 2018

In this case it's Docker providing the CPU in %, and it's 100*Cores. So it's either a glitch in how docker reported it, or a parsing issue. We do divide by the number of cores though to normalize on 100%. The easiest thing to do right now it to do a new measurement to get the real value.

Sorry I thought it was from another PR ... In this case we do process the CPU manually based on this code:

https://github.com/aspnet/benchmarks/blob/dev/src/BenchmarksServer/Startup.cs#L348-L352

Maybe the delay between a measure was too small during the whole benchmark. We do take the max value of all the samples during the test, I'll look into it.

@davidfowl davidfowl deleted the davidfowl/streams-and-memory branch March 1, 2018 07:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants