-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Zstandard changes #18960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zstandard changes #18960
Conversation
pub fn decompressStreamOptions( | ||
allocator: Allocator, | ||
reader: anytype, | ||
comptime options: DecompressStreamOptions, | ||
) DecompressStream(@TypeOf(reader, options)) { | ||
return DecompressStream(@TypeOf(reader), options).init(allocator, reader); | ||
window_buffer: *[options.window_size_max]u8, | ||
) DecompressStream(@TypeOf(reader), options) { | ||
return DecompressStream(@TypeOf(reader), options).init(reader, window_buffer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it possible for the user to pass a buffer that does not have size equal to options.window_size_max
, which will lead to a compile error. Perhaps it would be better to make a separate options struct that does not include the window size. Doing this would require the new type to be kept in sync with any future changes to DecompressStreamOptions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZstdWrapper
type in src/Package/Fetch.zig
is no longer used and has not been deleted.
literal_fse_buffer: [table_size_max.literal]types.compressed_block.Table.Fse, | ||
match_fse_buffer: [table_size_max.match]types.compressed_block.Table.Fse, | ||
offset_fse_buffer: [table_size_max.offset]types.compressed_block.Table.Fse, | ||
literals_buffer: [types.block_size_max]u8, | ||
sequence_buffer: [types.block_size_max]u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These being arrays rather than pointers makes the DecompressStream
relatively large: 2^9 + 2^9 + 2^8 + 2^17 + 2^17 gives 263424 B = 257.25 KiB for these buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of the buffers can probably be cut in half by combiningliterals_buffer
and sequence_buffer
, though this would need double checking the spec to make sure it can be done and would add a bit of complexity to the compressed block handling.
43eb16c
to
8eceae4
Compare
I assume they do - if they didn't everyone would have to try and decompress streams even when the declared window size is outside the supported range on the off chance that the decompressed data is small enough. Surely you want to error out immediately when the window size is read. Another option is that compressors don't limit the declare window size but they make sure to include the frame content size in the frame header if the window size is larger than the uncompressed data. |
b3a723d
to
f08469c
Compare
Let the user add that if they wish to. It's not strictly necessary, and arguably a harmful default.
It incorrectly had NotWriteable and MessageTooLong in it.
The buffer for HTTP headers is now always provided via a static buffer. As a consequence, OutOfMemory is no longer a member of the read() error set, and the API and implementation of Client and Server are simplified. error.HttpHeadersExceededSizeLimit is renamed to error.HttpHeadersOversize.
companion commit to 919a3ba
This is a state machine that already has a `state` field. No need to additionally store "done" - it just makes things unnecessarily complicated and buggy.
Documentation comments are not an appropriate place to put code samples.
I originally removed these in 402f967. I allowed them to be added back in ziglang#15299 because they were smuggled in alongside a bug fix, however, I wasn't kidding when I said that I wanted to take the design of std.http in a different direction than using this data structure. Instead, some headers are provided via explicit field names populated while parsing the HTTP request/response, and some are provided via new fields that support passing extra, arbitrary headers. This resulted in simplification of logic in many places, as well as elimination of the possibility of failure in many places. There is less deinitialization code happening now. Furthermore, it made it no longer necessary to clone the headers data structure in order to handle redirects. http_proxy and https_proxy fields are now pointers since it is common for them to be unpopulated. loadDefaultProxies is changed into initDefaultProxies to communicate that it does not actually load anything from disk or from the network. The function now is leaky; the API user must pass an already instantiated arena allocator. Removes the need to deinitialize proxies. Before, proxies stored arbitrary sets of headers. Now they only store the authorization value. Removed the duplicated code between https_proxy and http_proxy. Finally, parsing failures of the environment variables result in errors being emitted rather than silently ignoring the proxy. error.CompressionNotSupported is renamed to error.CompressionUnsupported, matching the naming convention from all the other errors in the same set. Removed documentation comments that were redundant with field and type names. Disabling zstd decompression in the server for now; see ziglang#18937. I found some apparently dead code in src/Package/Fetch/git.zig. I want to check with Ian about this. I discovered that test/standalone/http.zig is dead code, it is only being compiled but not being run. Furthermore it hangs at the end if you run it manually. The previous commits in this branch were written under the assumption that this test was being run with `zig build test-standalone`.
The Allocator requirement is problematic.
I mistakenly thought this was dead code in an earlier commit in this branch. This commit restores the proper behavior.
This can no longer fail due to OOM.
making it no longer dead code. it is currently failing.
This reverts commit 42be972. Using a bit to distinguish between headers and trailers is fine. It was just named and documented poorly.
* add API for iterating over custom HTTP headers * remove `trailing` flag from std.http.Client.parse. Instead, simply don't call parse() for trailers. * fix the logic inside that parse() function. it was using wrong std.mem functions, ignoring malformed data, and returned errors on dead branches. * simplify logic inside wait() * fix HeadersParser not dropping the 2 read bytes of \r\n after a chunked transfer * move the trailers test to be a std lib unit test and make it pass
Perhaps the language should enforce this.
for checking if a proxy is connecting to itself
I don't like this mechanism in general, and it is unused by the standard library.
This is not an appropriate place to put this code. It belongs in the caller's code, if at all.
Tests for the |
* "storage" is a better name than "strategy". * The most flexible memory-based storage API is appending to an ArrayList. * HTTP method should default to POST if there is a payload. * Avoid storing unnecessary data in the FetchResult * Avoid the need for a deinit() method in the FetchResult The decisions that this logic made about how to handle files is beyond repair: - fail to use sendfile() on a plain connection - redundant stat - does not handle arbitrary streams So, file-based response storage is no longer supported. Users should use the lower-level open() API which allows avoiding these pitfalls.
Before, this code constructed an arena allocator and then used it when handling redirects. You know what's better than having threads fight over an allocator? Avoiding dynamic memory allocation in the first place. This commit reuses the http headers static buffer for handling redirects. The new location is copied to the beginning of the static header buffer and then the subsequent request uses a subslice of that buffer.
a66128d
to
4417ceb
Compare
The remaining CI failure should be fixed if you rebase. Also, I will change the default WASI stack size default to be the same as other operating systems which is 16 MiB. Edit: done: #18971 |
Was it caused by changes in the |
This change corrects the size of various internal buffers used. The previous behavior did not cause validity problems but wasted space.
dc5d5c9
to
99d88d9
Compare
This commit can be reverted after ziglang#18971 is merged.
99d88d9
to
f32b46f
Compare
4417ceb
to
8acb62d
Compare
Thanks! cherry-picked into #18955. |
Is there some reason the buffer needs to have a comptime-known length? This API does not allow the user to supply an arbitrary buffer. |
I wouldn't need to be comptime-known - I think the only reason it was before was to make the API match the other compression formats, but now that a buffer needs to be passed it it won't be quite the same anyway. Both the window length and the verify-checksum option could be made runtime with a patch like: diff --git a/lib/std/compress/zstandard.zig b/lib/std/compress/zstandard.zig
index b7610a58b7..bc080a51dc 100644
--- a/lib/std/compress/zstandard.zig
+++ b/lib/std/compress/zstandard.zig
@@ -9,20 +9,14 @@ pub const decompress = @import("zstandard/decompress.zig");
pub const DecompressStreamOptions = struct {
verify_checksum: bool = true,
- window_size_max: usize = default_window_size_max,
-
- pub const default_window_size_max = 1 << 23; // 8MiB default maximum window size
};
pub fn DecompressStream(
comptime ReaderType: type,
- comptime options: DecompressStreamOptions,
) type {
return struct {
const Self = @This();
- pub const window_size_max = options.window_size_max;
-
const table_size_max = types.compressed_block.table_size_max;
source: std.io.CountingReader(ReaderType),
@@ -35,11 +29,12 @@ pub fn DecompressStream(
offset_fse_buffer: [table_size_max.offset]types.compressed_block.Table.Fse,
literals_buffer: [types.block_size_max]u8,
sequence_buffer: [types.block_size_max]u8,
- checksum: if (options.verify_checksum) ?u32 else void,
+ verify_checksum: bool,
+ checksum: ?u32,
current_frame_decompressed_size: usize,
const WindowBuffer = struct {
- data: *[options.window_size_max]u8 = undefined,
+ data: []u8 = undefined,
read_index: usize = 0,
write_index: usize = 0,
};
@@ -54,7 +49,7 @@ pub fn DecompressStream(
pub const Reader = std.io.Reader(*Self, Error, read);
- pub fn init(source: ReaderType, window_buffer: *[options.window_size_max]u8) Self {
+ pub fn init(source: ReaderType, window_buffer: []u8, options: DecompressStreamOptions) Self {
return Self{
.source = std.io.countingReader(source),
.state = .NewFrame,
@@ -66,6 +61,7 @@ pub fn DecompressStream(
.offset_fse_buffer = undefined,
.literals_buffer = undefined,
.sequence_buffer = undefined,
+ .verify_checksum = options.verify_checksum,
.checksum = undefined,
.current_frame_decompressed_size = undefined,
};
@@ -81,8 +77,8 @@ pub fn DecompressStream(
.zstandard => |header| {
const frame_context = try decompress.FrameContext.init(
header,
- options.window_size_max,
- options.verify_checksum,
+ self.buffer.data.len,
+ self.verify_checksum,
);
const decode_state = decompress.block.DecodeState.init(
@@ -94,7 +90,7 @@ pub fn DecompressStream(
self.decode_state = decode_state;
self.frame_context = frame_context;
- self.checksum = if (options.verify_checksum) null else {};
+ self.checksum = null;
self.current_frame_decompressed_size = 0;
self.state = .InFrame;
@@ -176,7 +172,7 @@ pub fn DecompressStream(
if (self.frame_context.has_checksum) {
const checksum = source_reader.readInt(u32, .little) catch
return error.MalformedFrame;
- if (comptime options.verify_checksum) {
+ if (self.verify_checksum) {
if (self.frame_context.hasher_opt) |*hasher| {
if (checksum != decompress.computeChecksum(hasher))
return error.ChecksumFailure;
@@ -213,15 +209,18 @@ pub fn decompressStreamOptions(
pub fn decompressStream(
reader: anytype,
- window_buffer: *[DecompressStreamOptions.default_window_size_max]u8,
-) DecompressStream(@TypeOf(reader), .{}) {
- return DecompressStream(@TypeOf(reader), .{}).init(reader, window_buffer);
+ window_buffer: []u8,
+ options: DecompressStreamOptions,
+) DecompressStream(@TypeOf(reader)) {
+ return DecompressStream(@TypeOf(reader)).init(reader, window_buffer, options);
}
fn testDecompress(data: []const u8) ![]u8 {
- var window_buffer: [DecompressStreamOptions.default_window_size_max]u8 = undefined;
+ const window_buffer = try std.testing.allocator.alloc(u8, 1 << 23);
+ defer std.testing.allocator.free(window_buffer);
+
var in_stream = std.io.fixedBufferStream(data);
- var zstd_stream = decompressStream(in_stream.reader(), &window_buffer);
+ var zstd_stream = decompressStream(in_stream.reader(), window_buffer, .{});
const result = zstd_stream.reader().readAllAlloc(std.testing.allocator, std.math.maxInt(usize));
return result;
}
@@ -251,7 +250,7 @@ test "zstandard decompression" {
test "zstandard streaming decompression" {
// default stack size for wasm32 is too low for DecompressStream - slightly
- // over 9MiB stack space is needed via the --stack CLI flag
+ // over 1MiB stack space is needed via the --stack CLI flag
if (@import("builtin").target.cpu.arch == .wasm32) return error.SkipZigTest;
const uncompressed = @embedFile("testdata/rfc8478.txt");
@@ -279,9 +278,11 @@ fn expectEqualDecoded(expected: []const u8, input: []const u8) !void {
}
fn expectEqualDecodedStreaming(expected: []const u8, input: []const u8) !void {
- var window_buffer: [DecompressStreamOptions.default_window_size_max]u8 = undefined;
+ const window_buffer = try std.testing.allocator.alloc(u8, 1 << 23);
+ defer std.testing.allocator.free(window_buffer);
+
var in_stream = std.io.fixedBufferStream(input);
- var stream = decompressStream(in_stream.reader(), &window_buffer);
+ var stream = decompressStream(in_stream.reader(), window_buffer, .{});
const result = try stream.reader().readAllAlloc(std.testing.allocator, std.math.maxInt(usize));
defer std.testing.allocator.free(result);
@@ -307,7 +308,7 @@ test "zero sized block" {
test "zero sized block streaming" {
// default stack size for wasm32 is too low for DecompressStream - slightly
- // over 9MiB stack space is needed via the --stack CLI flag
+ // over 1MiB stack space is needed via the --stack CLI flag
if (@import("builtin").target.cpu.arch == .wasm32) return error.SkipZigTest;
const input_raw = EDIT: This patch also reduces the stack size needed for the tests to a little over 1MiB by allocating the window buffer in the tests, but this still requires passing |
Sounds good! If you push that commit to your branch I'll cherry pick it |
Done. |
Thanks! |
Closes #18937.
This change removes all allocation from the
std.compress.zstd.DecompressStream
type, instead requiring the user to provide a buffer to use for the decompression window. The size of this buffer is given by thewindow_size_max
field ofstd.compress.zstandard.DecompressStreamOptions
, which defaults to 8 MiB as the Zstandard spec suggests implementations support at least an 8MiB window.In order for
std.http
to handle Zstandard compression, the server and clients will only need to provide a window buffer to the decompressor. In theory the window buffer does not need to be 8MiB if there is a smaller upper bound on the decompressed size of a valid message - the window does not need to be larger than the largest possible (decompressed) message, but I'm not sure if zstd compressors in the wild limit the declared window size in the frame header to the size of the uncompressed data. The spec does not mention if this should be done and if compressors do not, further changes tostd.compress.zstandard
will be required to support situations where a frame's declared window size is larger than the uncompressed size of the data.