Skip to content

pipeline doc - #14414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Oct 12, 2019
Merged

pipeline doc - #14414

merged 44 commits into from
Oct 12, 2019

Conversation

Rick-Anderson
Copy link
Contributor

@Rick-Anderson Rick-Anderson commented Sep 13, 2019

Internal review URLs

Todo

  • Hook up TOC
  • add metadata
  • UE Pass

Fixes #7516

Copy link
Contributor

@mairaw mairaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments to get started

@Rick-Anderson
Copy link
Contributor Author

@mairaw I haven't started on the buffer doc yet. I've made some changes to pipeline.

@Rick-Anderson
Copy link
Contributor Author

Rick-Anderson commented Sep 13, 2019

@BillWagner see Pipe basic usage

Would you be able to create a project with the code in this PR (from this sample down). I'm having trouble getting it to work. I can then do snippet references and customer can have code they can download/test/modify.

image

If you're too busy I could ask @KPixel to migrate the code.

@davidfowl
Copy link
Member

I'll own fixing the code samples, that's not a problem.

@davidfowl
Copy link
Member

cc @pakrym @jkotalik @halter73

@Rick-Anderson
Copy link
Contributor Author

@davidfowl really fantastic info I'm anxious to get published. If @pakrym | @jkotalik | @halter73 could put the code in a repo that will speed up importing it.

No hurry as I'm doing this after my ASP.NET Core 3.0 migration work.

}
```

The above method searches each segment for a specific byte. If we need to keep track of each segment's `SequencePosition` then [`ReadOnlySequence.TryGet`](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.readonlysequence-1.tryget?view=netcore-3.0) would be more appropriate. Let's change the above code to return a `SequencePosition` instead of an integer. This has the added benefit of allowing the caller to avoid a second scan to get the data at a specific index.
Copy link
Contributor

@idg10 idg10 Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a little while to work out what this meant:

"allowing the caller to avoid a second scan"

There are two kinds of scans going on here: the Span<byte>.IndexOf calls, and then the scan through the segments in the sequence. My mental model was that the bulk of the scan-like work here is the IndexOf because that's going to touch every byte until it finds what it's looking for. (Also, so far in my work with ReadOnlySequence<T>, which I've been using in conjunction with pipelines, the majority have been single-segment, with multi-segment examples only appearing as you approach the end of a buffer. I've found my app gets better throughput with largish buffer sizes, ensuring that most of the time I'm in the single-segment case, so I naturally tend to think of that one as the hot path that's most perf-critical.)

So maybe qualify this as "...a second scan through the sequence's segments to get..."?


* Is called to get memory from the underlying writer.
* `PipeWriter.Advance(int)` is called to tell the `PipeWriter` how much data was written to the buffer.
* `PipeWriter.FlushAsync` is called to make the data available to the `PipeReader`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reader, a question I have at this point is: why are these separate operations?

In your example, the call to Advance is followed immediately (as long as an exception didn't occur) by a call to FlushAsync. This suggests that an alternative design was possible, in which these were rolled into a single method. But presumably there's a reason you didn't do that—there must be some scenario in which you might want to call Advance more than once in between calls to FlushAsync? But what is that scenario? As it stands, I don't understand this aspect of the design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests that an alternative design was possible, in which these were rolled into a single method. But presumably there's a reason you didn't do that—there must be some scenario in which you might want to call Advance more than once in between calls to FlushAsync? But what is that scenario? As it stands, I don't understand this aspect of the design.

Buffering. You can GetMemory()/GetSpan and Advance(..) over and over then when you have buffered up the appropriate message/amount you call FlushAsync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I don't think this is very discoverable right now, so I'm wondering if it would be helpful to add a short description of when this might be something you'd want to do. I think in the app where I'm using this I don't need to, but it's hard to be certain from the docs—my read loop reads fairly large blocks of data off disk and writes them into the pipe, and I flush every time. But with the socket example you've given I don't actually know how to judge whether the "flush on every iteration" approach the code you've shown would be right in any particular scenario. If I happen to know that I'm receiving from a source that sends data in small dribs and drabs (not a hypothetical question for me, by the way) should I be waiting until I've "buffered up the appropriate message/amount"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not useful when bytes are being transferred from one source to the other. In that scenario the writer has no idea what the bytes are and can't make sense of how much should be buffered before flushing.

There are cases like serialization that take in an IBufferWriter<byte> or when you're directly writing via the PipeWriter where you can choose to write an "entire message" not pieces of a message.

Here's an example:

A simple length prefixed message, you could imagine writing code like this:

async Task WriteMessage(PipeWriter writer, byte[] payload)
{
    writer.Write(GetLength(payload));
    writer.Advance(4); // Write 4 bytes
    writer.Write(payload);
    writer.Advance(payload.Length);
    await writer.FlushAsync();
}

@Rick-Anderson Rick-Anderson changed the title Fowlers pipeline doc DRAFT: pipeline doc - PLEASE HOLD COMMENTS UNTIL IT'S READY FOR REVIEW Sep 16, 2019

The preceding method requests a buffer of at least 5 bytes from the `IBufferWriter<byte>` using `GetSpan(5)` then writes bytes for the ASCII string "Hello" to the returned `Span<byte>`. It then calls `Advance(written)` to indicate how many bytes were written to the buffer.

This method of writing uses the `Memory<T>`/`Span<T>` buffer provided by the `IBufferWriter<T>`, but you can also use the [`Write`](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.buffersextensions.write?view=netstandard-2.1) extension method to copy an existing buffer to the `IBufferWriter<T>`. `Write` does the work of calling `GetSpan`/`Advance` as appropriate, so there's no need to call `Advance` after writing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLEASE HOLD COMMENTS UNTIL IT'S READY FOR REVIEW

😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I stopped as soon as I saw that. Oops!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP. This is taking a back seat to the ASP.NET Core 3.0 release so I'll be slow doing updates. At some point I'll need one of you to get the bulk of the pipe code into a working project on a Git repo. That way I can migrate the code snippets to this doc and not have embedded snippets.

@halter73
Copy link
Member

Is there any mention of completing Pipes with an exception? Given the overall depth this doc, that seems like a pretty big omission to me.

@davidfowl
Copy link
Member

I’ll add it in the future. I want to get this in since it’s the first big conceptual doc. But you’re right. That is missing.

All of the changes that don’t have suggested edits will happen later.


* The entire message (end of line) might not be received in a single call to `ReadAsync`.
* It's ignoring the result of `stream.ReadAsync`. `stream.ReadAsync` returns how much data was read.
* It doesn't handle the case where multiple lines are read in a single `ReadAsync` call.
Copy link
Contributor

@jkotalik jkotalik Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also consider mentioning that the method is always allocating a byte array each time we read data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also consider mentioning that the method is always allocating a byte array each time we read data.

Added

  • It allocates a byte array with each read.

* Returns an incomplete `ValueTask<FlushResult>` when the amount of data in the `Pipe` crosses `PauseWriterThreshold`.
* Completes `ValueTask<FlushResult>` when it becomes lower than `ResumeWriterThreshold`.

Two values are used to prevent rapid cycling, which can occur if one value is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd give an example of the rapid cycling if the same value is used for the threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotalik please suggest verbiage.


## Streams

When reading or writing stream data, you typically read data using a de-serializer and write data using a serializer. Most of these read and write stream APIs have a `Stream` parameter. To make it easier to integrate with these existing APIs, `PipeReader` and `PipeWriter` expose an <xref:System.IO.Pipelines.PipeReader.AsStream%2A>. <xref:System.IO.Pipelines.PipeWriter.AsStream%2A> returns a `Stream` implementation around the `PipeReader` or `PipeWriter`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention that there are PipeReader.Create and PipeWriter.Create methods to convert a stream into a PipeReader/PipeWriter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on me. I need to expand on the Stream interop with examples. I'll do this in a different PR

Rick Anderson and others added 2 commits October 10, 2019 11:32
halter73 suggestions

Co-Authored-By: Stephen Halter <[email protected]>
@Rick-Anderson
Copy link
Contributor Author

@davidfowl should I format the code/comments to minimize the horizontal scroll bars on a tablet? You can simulate a tablet on bigger displays by making the widest window without a left side and right side TOC.
For example, Read a single message

image

@davidfowl
Copy link
Member

Yes we can wrap the comments and should align the code by the parameters

Copy link
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mairaw
Copy link
Contributor

mairaw commented Oct 11, 2019

Feel free to squash and merge this when you're ready @Rick-Anderson. I imagine you can do the comment wrapping separately since it's a nice improvement but not blocking right?

@Rick-Anderson Rick-Anderson merged commit a607439 into master Oct 12, 2019
@Rick-Anderson Rick-Anderson deleted the ra/buffers/pipelines branch October 12, 2019 00:05
@davidfowl
Copy link
Member

Where's the real URL?

@Rick-Anderson
Copy link
Contributor Author

@mairaw @BillWagner when do you merge master into live? Let me know so I can tweat this before @davidfowl does.

@mairaw
Copy link
Contributor

mairaw commented Oct 12, 2019

Usually by the end of my day shift 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New doc: Pipelines
8 participants