Skip to content

std.io: add BufferedReader.peek #19310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

std.io: add BufferedReader.peek #19310

wants to merge 1 commit into from

Conversation

pierrec
Copy link
Contributor

@pierrec pierrec commented Mar 14, 2024

Fixes #4501.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should additionally assert that dest.len is <= buf.len (and document that assertion).

How about a more convenient API instead?

pub const PeekError = Error || error{EndOfStream};

/// Returns the next `n` bytes without advancing.
///
/// The returned slice is a subslice of the buffer, which may or may not start
/// from the beginning. The length of the returned slice is always `n`.
///
/// Asserts that the number of requested bytes is less than or equal to the
/// buffer size.
pub fn peek(self: *Self, n: usize) PeekError![]u8 {
    // ...
}

@pierrec
Copy link
Contributor Author

pierrec commented Mar 18, 2024

After using this API a bit, I think the following would be better as it avoids copying:

/// Peeks the next `n` bytes without advancing the reader.
/// The bytes are valid until the next read call.
/// If the number of requested bytes is larger than the underlying buffer,
/// `buffer_size`bytes are returned.
pub fn peek(self: *Self, n: usize) Error![]const u8

 /// Discards the next `n` bytes and returns the number of discarded bytes.
pub fn discard(self: *Self, n: usize) Error!usize

@andrewrk
Copy link
Member

Why do you want it to be possible to provide a buffer that is too small?

Related, I have this sketch for a non-generic BufferedReader:

unbuffered_reader: io.AnyReader,
buffer: []u8,
start: usize,
end: usize,

pub fn init(unbuffered_reader: io.AnyReader, buffer: []u8) BufferedReader {
    return .{
        .unbuffered_reader = unbuffered_reader,
        .buffer = buffer,
        .start = 0,
        .end = 0,
    };
}

/// If the amount requested can be fulfilled from already-buffered data, no
/// underlying read occurs, however, if the amount requested exceeds the amount
/// of buffered data, an underlying read occurs.
///
/// Calling this function will cause at most one underlying read call.
pub fn readv(br: *BufferedReader, vecs: []const io.Vec) anyerror!usize {
    var total_read_len: usize = 0;
    var vec_i: usize = 0;
    while (vec_i < vecs.len) : (vec_i += 1) {
        var vec = vecs[vec_i];
        while (vec.len > 0) {
            if (br.end == br.start) {
                // Caller wants more data but we have none buffered.
                // If the caller has only one vector remaining, then we'll pass
                // it along with the main buffer to underlying read.
                // Otherwise we'll pass the rest of the caller's vectors directly
                // to the underlying reader with one tweak: if the last vector
                // is smaller than the main buffer, then we swap it with the main
                // buffer instead.
                const remaining_vecs = vecs[vec_i..];
                switch (remaining_vecs.len) {
                    0 => unreachable, // vec.len > 0 above
                    1 => {
                        var my_vecs: [2]io.Vec = .{
                            vec,
                            .{
                                .ptr = br.buffer.ptr,
                                .len = br.buffer.len,
                            },
                        };
                        const n = try br.unbuffered_reader.readv(&my_vecs);
                        if (n <= vec.len) {
                            total_read_len += n;
                            return total_read_len;
                        }
                        br.start = 0;
                        br.end = n - vec.len;
                        total_read_len += vec.len;
                        return total_read_len;
                    },
                    else => {
                        const last = remaining_vecs[remaining_vecs.len - 1];
                        if (last.len < br.buffer.len) {
                            const first = remaining_vecs[0];
                            defer {
                                remaining_vecs[0] = first;
                                remaining_vecs[remaining_vecs.len - 1] = last;
                            }
                            remaining_vecs[0] = vec;
                            remaining_vecs[remaining_vecs.len - 1] = .{
                                .ptr = br.buffer.ptr,
                                .len = br.buffer.len,
                            };
                            var n = try br.unbuffered_reader.readv(remaining_vecs);
                            for (remaining_vecs[0 .. remaining_vecs.len - 1]) |v| {
                                if (n >= v.len) {
                                    n -= v.len;
                                    total_read_len += v.len;
                                    continue;
                                }
                                total_read_len += n;
                                return total_read_len;
                            }
                            const copy_len = @min(last.len, n);
                            @memcpy(last.ptr[0..copy_len], br.buffer[0..n]);
                            total_read_len += copy_len;
                            br.start = copy_len;
                            br.end = n;
                            return total_read_len;
                        }
                        total_read_len += try br.unbuffered_reader.readv(remaining_vecs);
                        return total_read_len;
                    },
                }
            }
            const copy_len = @min(vec.len, br.end - br.start);
            @memcpy(vec.ptr[0..copy_len], br.buffer[br.start..][0..copy_len]);
            vec.ptr += copy_len;
            vec.len -= copy_len;
            total_read_len += copy_len;
            br.start += copy_len;
        }
    }
    return total_read_len;
}

pub fn reader(br: *BufferedReader) io.AnyReader {
    return .{
        .context = br,
        .readv = readv,
    };
}

const std = @import("../std.zig");
const io = std.io;
const BufferedReader = @This();

With this code, the buffer is provided by the user rather than being part of the type. Under these circumstances, I can't imagine the programmer passing a buffer smaller than the one they will use for peek(). Is that the case in your project that you've been testing this API with?

@pierrec
Copy link
Contributor Author

pierrec commented Mar 19, 2024

Why do you want it to be possible to provide a buffer that is too small?

...

With this code, the buffer is provided by the user rather than being part of the type. Under these circumstances, I can't imagine the programmer passing a buffer smaller than the one they will use for peek(). Is that the case in your project that you've been testing this API with?

I totally agree, I am only thinking about programmer errors. Maybe an assert would be the best solution.

Will update the PR with the API I currently use, the one proposed above and fix the comments.

@pierrec
Copy link
Contributor Author

pierrec commented Mar 22, 2024

I think this function should additionally assert that dest.len is <= buf.len (and document that assertion).

How about a more convenient API instead?

pub const PeekError = Error || error{EndOfStream};

/// Returns the next `n` bytes without advancing.
///
/// The returned slice is a subslice of the buffer, which may or may not start
/// from the beginning. The length of the returned slice is always `n`.
///
/// Asserts that the number of requested bytes is less than or equal to the
/// buffer size.
pub fn peek(self: *Self, n: usize) PeekError![]u8 {
    // ...
}

Missed this comment... But came to the same conclusion. PR updated accordingly.

@pierrec pierrec requested a review from andrewrk May 11, 2024 09:08
@pierrec pierrec closed this by deleting the head repository Sep 9, 2024
@notcancername notcancername mentioned this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

provide a peeking API to std.io.BufferedReader
2 participants