Skip to content

propose adding BufferedTee to the std.io #19032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 22, 2024

Conversation

ianic
Copy link
Contributor

@ianic ianic commented Feb 21, 2024

While analyzing #18967 I came to the idea of this kind of stream, Ian named it BufferedTee. The main idea is to allow consumer to put back to the stream some bytes it has already read, allowing it to lookahead few bytes and letter change it's mind and not consume them. Other consumers of the same stream should not be affected by this. BufferedTee is holding other consumers some number of bytes behind 'main' consumer allowing it to put back that number of bytes.

I feel that except helping with this particular case there can be other cases using lookahead approach which can found this type of stream useful. That is the reason behind this PR.

BufferedTee provides reader interface to the consumer. Data read by consumer is also written to the output. Output is hold lookahead_size bytes behind consumer. Allowing consumer to put back some bytes to be read again. On flush all consumed bytes are flushed to the output.

  input   ->   tee   ->   consumer
                |
             output

input - underlying unbuffered reader
output - writer, receives data read by consumer
consumer - uses provided reader interface

If lookahead_size is zero output always has same bytes as consumer.

BufferedTee provides reader interface to the consumer. Data read by consumer
is also written to the output. Output is hold lookahead_size bytes behind
consumer. Allowing consumer to put back some bytes to be read again. On flush
all consumed bytes are flushed to the output.

      input   ->   tee   ->   consumer
                    |
                 output

input - underlying unbuffered reader
output - writer, receives data read by consumer
consumer - uses provided reader interface

If lookahead_size is zero output always has same bytes as consumer.
@squeek502 squeek502 changed the title propose adding BufferedTea to the std.io propose adding BufferedTee to the std.io Feb 21, 2024
@nektro
Copy link
Contributor

nektro commented Feb 21, 2024

rather than read and putBack I usually do peek and read to achieve this pattern.

@ianic
Copy link
Contributor Author

ianic commented Feb 21, 2024

rather than read and putBack I usually do peek and read to achieve this pattern.

In this case we have readers in the chain: checksum and decompressor. If the decompressor overshoots whether by using read or peek checksum will be wrong. Peek will also pull data through checksum.

Copy link
Contributor

@ianprime0509 ianprime0509 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very nice, thanks again for proposing it. Would you consider adding the git.zig changes you made in #18967 (comment) to this PR as a separate commit? Having those changes also included would give a concrete example of where this is useful.

Comment on lines 1 to 15
//! BufferedTee provides reader interface to the consumer. Data read by consumer
//! is also written to the output. Output is hold lookahead_size bytes behind
//! consumer. Allowing consumer to put back some bytes to be read again. On flush
//! all consumed bytes are flushed to the output.
//!
//! input -> tee -> consumer
//! |
//! output
//!
//! input - underlying unbuffered reader
//! output - writer, receives data read by consumer
//! consumer - uses provided reader interface
//!
//! If lookahead_size is zero output always has same bytes as consumer.
//!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace of this file isn't accessible to users since BufferedTee and bufferedTee are exposed directly under std.io via pub const BufferedTee = @import("io/buffered_tee.zig").BufferedTee; in io.zig, so this doc comment won't show up in the generated documentation. If you move this right above pub fn BufferedTee and change it to a regular doc comment (///), then it'll show up as the documentation for that type.

Comment on lines 133 to 138
return BufferedTee(
buffer_size,
lookahead_size,
@TypeOf(input),
@TypeOf(output),
){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return BufferedTee(
buffer_size,
lookahead_size,
@TypeOf(input),
@TypeOf(output),
){
return .{

Just makes it a little more concise, since the full type has already been written out as the return type.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to land when @ianic and @ianprime0509 consider it ready

@ianic
Copy link
Contributor Author

ianic commented Feb 22, 2024

Would you consider adding the git.zig changes you made in #18967 (comment) to this PR as a separate commit

Is it possible to import std.zig from src/Package/Fetch/git.zig?
Changing const std = @import("std"); to const std = @import("../../../lib/std/std.zig"); will not work: outside module path.

If not I suggest to merge this and I'll make follow up PR for changes in git.zig.

@ianic ianic requested a review from ianprime0509 February 22, 2024 11:52
@ianprime0509
Copy link
Contributor

Is it possible to import std.zig from src/Package/Fetch/git.zig?

Yes, that's the same as @import("std") as long as you pass -Dno-lib (for zig build) or -DZIG_NO_LIB=ON (for cmake) when building Zig from source. That will prevent the lib directory from being copied when you build the compiler, and as a result it'll use the lib directory from your Zig source tree instead. If you've already built Zig from source, you can just delete the lib directory from your build to have the same effect.

Copy link
Contributor

@ianprime0509 ianprime0509 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! There are a couple very minor things that can be changed in git.zig, but other than that, this looks great to me.

}

/// Performs the first pass over the packfile data for index construction.
// Performs the first pass over the packfile data for index construction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Performs the first pass over the packfile data for index construction.
/// Performs the first pass over the packfile data for index construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uf, sorry.


switch (entry_header) {
.commit, .tree, .blob, .tag => |object| {
inline .commit, .tree, .blob, .tag => |object, tag| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switch arm doesn't need to be inline, as long as @tagName(tag) below is replaced with @tagName(entry_header) (this was a minor thing I fixed in my workaround PR which wasn't directly related to the workaround).

Copy link
Contributor

@ianprime0509 ianprime0509 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great.

@andrewrk andrewrk enabled auto-merge February 22, 2024 19:55
@andrewrk andrewrk merged commit 8802ec5 into ziglang:master Feb 22, 2024
ianic added a commit to ianic/zig that referenced this pull request Mar 20, 2024
Introduced in  ziglang#19032 as a fix for ziglang#18967.
Not needed any more after ziglang#19253.
andrewrk pushed a commit that referenced this pull request Mar 21, 2024
Introduced in  #19032 as a fix for #18967.
Not needed any more after #19253.
@ianic ianic deleted the add_buffered_tee branch April 3, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants