propose adding BufferedTee to the std.io #19032

ianic · 2024-02-21T19:28:54Z

While analyzing #18967 I came to the idea of this kind of stream, Ian named it BufferedTee. The main idea is to allow consumer to put back to the stream some bytes it has already read, allowing it to lookahead few bytes and letter change it's mind and not consume them. Other consumers of the same stream should not be affected by this. BufferedTee is holding other consumers some number of bytes behind 'main' consumer allowing it to put back that number of bytes.

I feel that except helping with this particular case there can be other cases using lookahead approach which can found this type of stream useful. That is the reason behind this PR.

BufferedTee provides reader interface to the consumer. Data read by consumer is also written to the output. Output is hold lookahead_size bytes behind consumer. Allowing consumer to put back some bytes to be read again. On flush all consumed bytes are flushed to the output.
  input   ->   tee   ->   consumer
                |
             output
input - underlying unbuffered reader
output - writer, receives data read by consumer
consumer - uses provided reader interface

If lookahead_size is zero output always has same bytes as consumer.

BufferedTee provides reader interface to the consumer. Data read by consumer is also written to the output. Output is hold lookahead_size bytes behind consumer. Allowing consumer to put back some bytes to be read again. On flush all consumed bytes are flushed to the output. input -> tee -> consumer | output input - underlying unbuffered reader output - writer, receives data read by consumer consumer - uses provided reader interface If lookahead_size is zero output always has same bytes as consumer.

nektro · 2024-02-21T20:52:52Z

rather than read and putBack I usually do peek and read to achieve this pattern.

ianic · 2024-02-21T21:22:47Z

rather than read and putBack I usually do peek and read to achieve this pattern.

In this case we have readers in the chain: checksum and decompressor. If the decompressor overshoots whether by using read or peek checksum will be wrong. Peek will also pull data through checksum.

ianprime0509

I think this is very nice, thanks again for proposing it. Would you consider adding the git.zig changes you made in #18967 (comment) to this PR as a separate commit? Having those changes also included would give a concrete example of where this is useful.

ianprime0509 · 2024-02-22T01:30:54Z

lib/std/io/buffered_tee.zig

+//! BufferedTee provides reader interface to the consumer. Data read by consumer
+//! is also written to the output. Output is hold lookahead_size bytes behind
+//! consumer. Allowing consumer to put back some bytes to be read again. On flush
+//! all consumed bytes are flushed to the output.
+//!
+//!       input   ->   tee   ->   consumer
+//!                     |
+//!                  output
+//!
+//! input - underlying unbuffered reader
+//! output - writer, receives data read by consumer
+//! consumer - uses provided reader interface
+//!
+//! If lookahead_size is zero output always has same bytes as consumer.
+//!


The namespace of this file isn't accessible to users since BufferedTee and bufferedTee are exposed directly under std.io via pub const BufferedTee = @import("io/buffered_tee.zig").BufferedTee; in io.zig, so this doc comment won't show up in the generated documentation. If you move this right above pub fn BufferedTee and change it to a regular doc comment (///), then it'll show up as the documentation for that type.

ianprime0509 · 2024-02-22T01:32:31Z

lib/std/io/buffered_tee.zig

+    return BufferedTee(
+        buffer_size,
+        lookahead_size,
+        @TypeOf(input),
+        @TypeOf(output),
+    ){


Suggested change

return BufferedTee(

buffer_size,

lookahead_size,

@TypeOf(input),

@TypeOf(output),

){

return .{

Just makes it a little more concise, since the full type has already been written out as the return type.

andrewrk

This is ready to land when @ianic and @ianprime0509 consider it ready

ziglang#19032 (review)

ianic · 2024-02-22T11:51:29Z

Would you consider adding the git.zig changes you made in #18967 (comment) to this PR as a separate commit

Is it possible to import std.zig from src/Package/Fetch/git.zig?
Changing const std = @import("std"); to const std = @import("../../../lib/std/std.zig"); will not work: outside module path.

If not I suggest to merge this and I'll make follow up PR for changes in git.zig.

ianprime0509 · 2024-02-22T13:30:00Z

Is it possible to import std.zig from src/Package/Fetch/git.zig?

Yes, that's the same as @import("std") as long as you pass -Dno-lib (for zig build) or -DZIG_NO_LIB=ON (for cmake) when building Zig from source. That will prevent the lib directory from being copied when you build the compiler, and as a result it'll use the lib directory from your Zig source tree instead. If you've already built Zig from source, you can just delete the lib directory from your build to have the same effect.

ianprime0509

Thanks! There are a couple very minor things that can be changed in git.zig, but other than that, this looks great to me.

ianprime0509 · 2024-02-22T16:05:54Z

src/Package/Fetch/git.zig

-}
-
-/// Performs the first pass over the packfile data for index construction.
+// Performs the first pass over the packfile data for index construction.


Suggested change

// Performs the first pass over the packfile data for index construction.

/// Performs the first pass over the packfile data for index construction.

ianprime0509 · 2024-02-22T16:08:17Z

src/Package/Fetch/git.zig


        switch (entry_header) {
-            .commit, .tree, .blob, .tag => |object| {
+            inline .commit, .tree, .blob, .tag => |object, tag| {


This switch arm doesn't need to be inline, as long as @tagName(tag) below is replaced with @tagName(entry_header) (this was a minor thing I fixed in my workaround PR which wasn't directly related to the workaround).

Review: ziglang#19032 (review)

ianprime0509

Thank you! This looks great.

Introduced in ziglang#19032 as a fix for ziglang#18967. Not needed any more after ziglang#19253.

Introduced in #19032 as a fix for #18967. Not needed any more after #19253.

ianic added 2 commits February 21, 2024 20:01

cleanup tests

ce1a590

squeek502 changed the title ~~propose adding BufferedTea to the std.io~~ propose adding BufferedTee to the std.io Feb 21, 2024

ianprime0509 approved these changes Feb 22, 2024

View reviewed changes

ianprime0509 mentioned this pull request Feb 22, 2024

New zlib decompressor may read more data than necessary #18967

Closed

andrewrk approved these changes Feb 22, 2024

View reviewed changes

refactor according to Ian's review

eb67fab

ziglang#19032 (review)

ianic requested a review from ianprime0509 February 22, 2024 11:52

use BufferedTee in Fetch/git.zig

d00faa2

ianprime0509 reviewed Feb 22, 2024

View reviewed changes

return few previous fixes

a5326c5

Review: ziglang#19032 (review)

ianprime0509 approved these changes Feb 22, 2024

View reviewed changes

andrewrk enabled auto-merge February 22, 2024 19:55

andrewrk merged commit 8802ec5 into ziglang:master Feb 22, 2024

ianic mentioned this pull request Mar 12, 2024

package: remove git fetch zlib lookahead fix #19253

Merged

ianic added a commit to ianic/zig that referenced this pull request Mar 20, 2024

std.io: remove BufferedTee

053e084

Introduced in ziglang#19032 as a fix for ziglang#18967. Not needed any more after ziglang#19253.

ianic mentioned this pull request Mar 20, 2024

std.io: remove BufferedTee #19368

Merged

andrewrk pushed a commit that referenced this pull request Mar 21, 2024

std.io: remove BufferedTee

e831313

Introduced in #19032 as a fix for #18967. Not needed any more after #19253.

ianic deleted the add_buffered_tee branch April 3, 2024 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

propose adding BufferedTee to the std.io #19032

propose adding BufferedTee to the std.io #19032

ianic commented Feb 21, 2024

nektro commented Feb 21, 2024

ianic commented Feb 21, 2024

ianprime0509 left a comment

ianprime0509 Feb 22, 2024

ianprime0509 Feb 22, 2024

andrewrk left a comment

ianic commented Feb 22, 2024

ianprime0509 commented Feb 22, 2024

ianprime0509 left a comment

ianprime0509 Feb 22, 2024

ianic Feb 22, 2024

ianprime0509 Feb 22, 2024

ianprime0509 left a comment

	// Performs the first pass over the packfile data for index construction.
	/// Performs the first pass over the packfile data for index construction.

propose adding BufferedTee to the std.io #19032

propose adding BufferedTee to the std.io #19032

Conversation

ianic commented Feb 21, 2024

nektro commented Feb 21, 2024

ianic commented Feb 21, 2024

ianprime0509 left a comment

Choose a reason for hiding this comment

ianprime0509 Feb 22, 2024

Choose a reason for hiding this comment

ianprime0509 Feb 22, 2024

Choose a reason for hiding this comment

andrewrk left a comment

Choose a reason for hiding this comment

ianic commented Feb 22, 2024

ianprime0509 commented Feb 22, 2024

ianprime0509 left a comment

Choose a reason for hiding this comment

ianprime0509 Feb 22, 2024

Choose a reason for hiding this comment

ianic Feb 22, 2024

Choose a reason for hiding this comment

ianprime0509 Feb 22, 2024

Choose a reason for hiding this comment

ianprime0509 left a comment

Choose a reason for hiding this comment