Zstandard changes #18960

dweiller · 2024-02-16T08:35:09Z

This change removes all allocation from the std.compress.zstd.DecompressStream type, instead requiring the user to provide a buffer to use for the decompression window. The size of this buffer is given by the window_size_max field of std.compress.zstandard.DecompressStreamOptions, which defaults to 8 MiB as the Zstandard spec suggests implementations support at least an 8MiB window.

In order for std.http to handle Zstandard compression, the server and clients will only need to provide a window buffer to the decompressor. In theory the window buffer does not need to be 8MiB if there is a smaller upper bound on the decompressed size of a valid message - the window does not need to be larger than the largest possible (decompressed) message, but I'm not sure if zstd compressors in the wild limit the declared window size in the frame header to the size of the uncompressed data. The spec does not mention if this should be done and if compressors do not, further changes to std.compress.zstandard will be required to support situations where a frame's declared window size is larger than the uncompressed size of the data.

dweiller · 2024-02-16T08:41:36Z

lib/std/compress/zstandard.zig

 pub fn decompressStreamOptions(
-    allocator: Allocator,
    reader: anytype,
    comptime options: DecompressStreamOptions,
-) DecompressStream(@TypeOf(reader, options)) {
-    return DecompressStream(@TypeOf(reader), options).init(allocator, reader);
+    window_buffer: *[options.window_size_max]u8,
+) DecompressStream(@TypeOf(reader), options) {
+    return DecompressStream(@TypeOf(reader), options).init(reader, window_buffer);
 }


This makes it possible for the user to pass a buffer that does not have size equal to options.window_size_max, which will lead to a compile error. Perhaps it would be better to make a separate options struct that does not include the window size. Doing this would require the new type to be kept in sync with any future changes to DecompressStreamOptions.

dweiller

~~The ZstdWrapper type in src/Package/Fetch.zig is no longer used and has not been deleted.~~

dweiller · 2024-02-16T08:51:28Z

lib/std/compress/zstandard.zig

+        literal_fse_buffer: [table_size_max.literal]types.compressed_block.Table.Fse,
+        match_fse_buffer: [table_size_max.match]types.compressed_block.Table.Fse,
+        offset_fse_buffer: [table_size_max.offset]types.compressed_block.Table.Fse,
+        literals_buffer: [types.block_size_max]u8,
+        sequence_buffer: [types.block_size_max]u8,


These being arrays rather than pointers makes the DecompressStream relatively large: 2^9 + 2^9 + 2^8 + 2^17 + 2^17 gives 263424 B = 257.25 KiB for these buffers.

The size of the buffers can probably be cut in half by combiningliterals_buffer and sequence_buffer, though this would need double checking the spec to make sure it can be done and would add a bit of complexity to the compressed block handling.

src/Package/Fetch.zig

dweiller · 2024-02-16T09:23:29Z

but I'm not sure if zstd compressors in the wild limit the declared window size in the frame header to the size of the uncompressed data

I assume they do - if they didn't everyone would have to try and decompress streams even when the declared window size is outside the supported range on the off chance that the decompressed data is small enough. Surely you want to error out immediately when the window size is read.

Another option is that compressors don't limit the declare window size but they make sure to include the frame content size in the frame header if the window size is larger than the uncompressed data.

Let the user add that if they wish to. It's not strictly necessary, and arguably a harmful default.

It incorrectly had NotWriteable and MessageTooLong in it.

The buffer for HTTP headers is now always provided via a static buffer. As a consequence, OutOfMemory is no longer a member of the read() error set, and the API and implementation of Client and Server are simplified. error.HttpHeadersExceededSizeLimit is renamed to error.HttpHeadersOversize.

companion commit to 919a3ba

This is a state machine that already has a `state` field. No need to additionally store "done" - it just makes things unnecessarily complicated and buggy.

Documentation comments are not an appropriate place to put code samples.

I originally removed these in 402f967. I allowed them to be added back in ziglang#15299 because they were smuggled in alongside a bug fix, however, I wasn't kidding when I said that I wanted to take the design of std.http in a different direction than using this data structure. Instead, some headers are provided via explicit field names populated while parsing the HTTP request/response, and some are provided via new fields that support passing extra, arbitrary headers. This resulted in simplification of logic in many places, as well as elimination of the possibility of failure in many places. There is less deinitialization code happening now. Furthermore, it made it no longer necessary to clone the headers data structure in order to handle redirects. http_proxy and https_proxy fields are now pointers since it is common for them to be unpopulated. loadDefaultProxies is changed into initDefaultProxies to communicate that it does not actually load anything from disk or from the network. The function now is leaky; the API user must pass an already instantiated arena allocator. Removes the need to deinitialize proxies. Before, proxies stored arbitrary sets of headers. Now they only store the authorization value. Removed the duplicated code between https_proxy and http_proxy. Finally, parsing failures of the environment variables result in errors being emitted rather than silently ignoring the proxy. error.CompressionNotSupported is renamed to error.CompressionUnsupported, matching the naming convention from all the other errors in the same set. Removed documentation comments that were redundant with field and type names. Disabling zstd decompression in the server for now; see ziglang#18937. I found some apparently dead code in src/Package/Fetch/git.zig. I want to check with Ian about this. I discovered that test/standalone/http.zig is dead code, it is only being compiled but not being run. Furthermore it hangs at the end if you run it manually. The previous commits in this branch were written under the assumption that this test was being run with `zig build test-standalone`.

The Allocator requirement is problematic.

I mistakenly thought this was dead code in an earlier commit in this branch. This commit restores the proper behavior.

This can no longer fail due to OOM.

making it no longer dead code. it is currently failing.

This reverts commit 42be972. Using a bit to distinguish between headers and trailers is fine. It was just named and documented poorly.

* add API for iterating over custom HTTP headers * remove `trailing` flag from std.http.Client.parse. Instead, simply don't call parse() for trailers. * fix the logic inside that parse() function. it was using wrong std.mem functions, ignoring malformed data, and returned errors on dead branches. * simplify logic inside wait() * fix HeadersParser not dropping the 2 read bytes of \r\n after a chunked transfer * move the trailers test to be a std lib unit test and make it pass

Perhaps the language should enforce this.

for checking if a proxy is connecting to itself

I don't like this mechanism in general, and it is unused by the standard library.

This is not an appropriate place to put this code. It belongs in the caller's code, if at all.

no

dweiller · 2024-02-17T04:09:11Z

Tests for the DecompressStream API need to be skipped for wasm32-wasi due to stack overflow. If they are not skipped, the tests pass with zig test lib/std/compress/zstandard.zig -target wasm32-wasi --stack STACK_SIZE --test-cmd wasmtime --test-cmd-bin where STACK_SIZE is slightly over 9MiB.

* "storage" is a better name than "strategy". * The most flexible memory-based storage API is appending to an ArrayList. * HTTP method should default to POST if there is a payload. * Avoid storing unnecessary data in the FetchResult * Avoid the need for a deinit() method in the FetchResult The decisions that this logic made about how to handle files is beyond repair: - fail to use sendfile() on a plain connection - redundant stat - does not handle arbitrary streams So, file-based response storage is no longer supported. Users should use the lower-level open() API which allows avoiding these pitfalls.

Before, this code constructed an arena allocator and then used it when handling redirects. You know what's better than having threads fight over an allocator? Avoiding dynamic memory allocation in the first place. This commit reuses the http headers static buffer for handling redirects. The new location is copied to the beginning of the static header buffer and then the subsequent request uses a subslice of that buffer.

andrewrk · 2024-02-17T09:34:53Z

The remaining CI failure should be fixed if you rebase.

Also, I will change the default WASI stack size default to be the same as other operating systems which is 16 MiB.

Edit: done: #18971

dweiller · 2024-02-17T09:59:47Z

The remaining CI failure should be fixed if you rebase.

Was it caused by changes in the std.http.Server branch? ~~If so I might rebase onto the wasi-default-stack-size branch to enable those tests, since this branch no longer does anything in std.http~~ Edit: While it makes no changes, it does depend on the removal of calls to deinit() in the http branch, so I'll leave this PR based on the http changes and if #18971 gets merged first I'll remove the commit disabling the tests on wasm32.

This change corrects the size of various internal buffers used. The previous behavior did not cause validity problems but wasted space.

This commit can be reverted after ziglang#18971 is merged.

andrewrk · 2024-02-17T18:52:56Z

Thanks! cherry-picked into #18955.

andrewrk · 2024-02-17T18:55:39Z

Is there some reason the buffer needs to have a comptime-known length? This API does not allow the user to supply an arbitrary buffer.

dweiller · 2024-02-17T23:38:23Z

Is there some reason the buffer needs to have a comptime-known length? This API does not allow the user to supply an arbitrary buffer.

I wouldn't need to be comptime-known - I think the only reason it was before was to make the API match the other compression formats, but now that a buffer needs to be passed it it won't be quite the same anyway. Both the window length and the verify-checksum option could be made runtime with a patch like:

diff --git a/lib/std/compress/zstandard.zig b/lib/std/compress/zstandard.zig
index b7610a58b7..bc080a51dc 100644
--- a/lib/std/compress/zstandard.zig
+++ b/lib/std/compress/zstandard.zig
@@ -9,20 +9,14 @@ pub const decompress = @import("zstandard/decompress.zig");
 
 pub const DecompressStreamOptions = struct {
     verify_checksum: bool = true,
-    window_size_max: usize = default_window_size_max,
-
-    pub const default_window_size_max = 1 << 23; // 8MiB default maximum window size
 };
 
 pub fn DecompressStream(
     comptime ReaderType: type,
-    comptime options: DecompressStreamOptions,
 ) type {
     return struct {
         const Self = @This();
 
-        pub const window_size_max = options.window_size_max;
-
         const table_size_max = types.compressed_block.table_size_max;
 
         source: std.io.CountingReader(ReaderType),
@@ -35,11 +29,12 @@ pub fn DecompressStream(
         offset_fse_buffer: [table_size_max.offset]types.compressed_block.Table.Fse,
         literals_buffer: [types.block_size_max]u8,
         sequence_buffer: [types.block_size_max]u8,
-        checksum: if (options.verify_checksum) ?u32 else void,
+        verify_checksum: bool,
+        checksum: ?u32,
         current_frame_decompressed_size: usize,
 
         const WindowBuffer = struct {
-            data: *[options.window_size_max]u8 = undefined,
+            data: []u8 = undefined,
             read_index: usize = 0,
             write_index: usize = 0,
         };
@@ -54,7 +49,7 @@ pub fn DecompressStream(
 
         pub const Reader = std.io.Reader(*Self, Error, read);
 
-        pub fn init(source: ReaderType, window_buffer: *[options.window_size_max]u8) Self {
+        pub fn init(source: ReaderType, window_buffer: []u8, options: DecompressStreamOptions) Self {
             return Self{
                 .source = std.io.countingReader(source),
                 .state = .NewFrame,
@@ -66,6 +61,7 @@ pub fn DecompressStream(
                 .offset_fse_buffer = undefined,
                 .literals_buffer = undefined,
                 .sequence_buffer = undefined,
+                .verify_checksum = options.verify_checksum,
                 .checksum = undefined,
                 .current_frame_decompressed_size = undefined,
             };
@@ -81,8 +77,8 @@ pub fn DecompressStream(
                 .zstandard => |header| {
                     const frame_context = try decompress.FrameContext.init(
                         header,
-                        options.window_size_max,
-                        options.verify_checksum,
+                        self.buffer.data.len,
+                        self.verify_checksum,
                     );
 
                     const decode_state = decompress.block.DecodeState.init(
@@ -94,7 +90,7 @@ pub fn DecompressStream(
                     self.decode_state = decode_state;
                     self.frame_context = frame_context;
 
-                    self.checksum = if (options.verify_checksum) null else {};
+                    self.checksum = null;
                     self.current_frame_decompressed_size = 0;
 
                     self.state = .InFrame;
@@ -176,7 +172,7 @@ pub fn DecompressStream(
                     if (self.frame_context.has_checksum) {
                         const checksum = source_reader.readInt(u32, .little) catch
                             return error.MalformedFrame;
-                        if (comptime options.verify_checksum) {
+                        if (self.verify_checksum) {
                             if (self.frame_context.hasher_opt) |*hasher| {
                                 if (checksum != decompress.computeChecksum(hasher))
                                     return error.ChecksumFailure;
@@ -213,15 +209,18 @@ pub fn decompressStreamOptions(
 
 pub fn decompressStream(
     reader: anytype,
-    window_buffer: *[DecompressStreamOptions.default_window_size_max]u8,
-) DecompressStream(@TypeOf(reader), .{}) {
-    return DecompressStream(@TypeOf(reader), .{}).init(reader, window_buffer);
+    window_buffer: []u8,
+    options: DecompressStreamOptions,
+) DecompressStream(@TypeOf(reader)) {
+    return DecompressStream(@TypeOf(reader)).init(reader, window_buffer, options);
 }
 
 fn testDecompress(data: []const u8) ![]u8 {
-    var window_buffer: [DecompressStreamOptions.default_window_size_max]u8 = undefined;
+    const window_buffer = try std.testing.allocator.alloc(u8, 1 << 23);
+    defer std.testing.allocator.free(window_buffer);
+
     var in_stream = std.io.fixedBufferStream(data);
-    var zstd_stream = decompressStream(in_stream.reader(), &window_buffer);
+    var zstd_stream = decompressStream(in_stream.reader(), window_buffer, .{});
     const result = zstd_stream.reader().readAllAlloc(std.testing.allocator, std.math.maxInt(usize));
     return result;
 }
@@ -251,7 +250,7 @@ test "zstandard decompression" {
 
 test "zstandard streaming decompression" {
     // default stack size for wasm32 is too low for DecompressStream - slightly
-    // over 9MiB stack space is needed via the --stack CLI flag
+    // over 1MiB stack space is needed via the --stack CLI flag
     if (@import("builtin").target.cpu.arch == .wasm32) return error.SkipZigTest;
 
     const uncompressed = @embedFile("testdata/rfc8478.txt");
@@ -279,9 +278,11 @@ fn expectEqualDecoded(expected: []const u8, input: []const u8) !void {
 }
 
 fn expectEqualDecodedStreaming(expected: []const u8, input: []const u8) !void {
-    var window_buffer: [DecompressStreamOptions.default_window_size_max]u8 = undefined;
+    const window_buffer = try std.testing.allocator.alloc(u8, 1 << 23);
+    defer std.testing.allocator.free(window_buffer);
+
     var in_stream = std.io.fixedBufferStream(input);
-    var stream = decompressStream(in_stream.reader(), &window_buffer);
+    var stream = decompressStream(in_stream.reader(), window_buffer, .{});
 
     const result = try stream.reader().readAllAlloc(std.testing.allocator, std.math.maxInt(usize));
     defer std.testing.allocator.free(result);
@@ -307,7 +308,7 @@ test "zero sized block" {
 
 test "zero sized block streaming" {
     // default stack size for wasm32 is too low for DecompressStream - slightly
-    // over 9MiB stack space is needed via the --stack CLI flag
+    // over 1MiB stack space is needed via the --stack CLI flag
     if (@import("builtin").target.cpu.arch == .wasm32) return error.SkipZigTest;
 
     const input_raw =

EDIT: This patch also reduces the stack size needed for the tests to a little over 1MiB by allocating the window buffer in the tests, but this still requires passing --stack on wasm32-wasi until #18971 gets merged.

andrewrk · 2024-02-18T02:14:45Z

Sounds good! If you push that commit to your branch I'll cherry pick it

dweiller · 2024-02-18T02:22:41Z

Sounds good! If you push that commit to your branch I'll cherry pick it

Done.

andrewrk · 2024-02-18T02:25:01Z

Thanks!

dweiller mentioned this pull request Feb 16, 2024

take std.http in a different direction #18955

Merged

dweiller commented Feb 16, 2024

View reviewed changes

dweiller force-pushed the zstandard-changes branch 2 times, most recently from 43eb16c to 8eceae4 Compare February 16, 2024 09:18

dweiller force-pushed the zstandard-changes branch 2 times, most recently from b3a723d to f08469c Compare February 16, 2024 11:07

andrewrk added 21 commits February 16, 2024 18:52

std.http.Server: don't emit Server HTTP header

f3a323a

Let the user add that if they wish to. It's not strictly necessary, and arguably a harmful default.

std.http.Server: fix error set

4752509

It incorrectly had NotWriteable and MessageTooLong in it.

std.http.Server: protect against zero-length chunks

e871e4b

companion commit to 919a3ba

std.mem: take advantage of length-based slicing

c6a121c

std.http: remove 'done' flag

e1b0c6c

This is a state machine that already has a `state` field. No need to additionally store "done" - it just makes things unnecessarily complicated and buggy.

std.http.Server: remove source code from doc comments

7d6cc6e

Documentation comments are not an appropriate place to put code samples.

std.http.Client.fetch: add redirect behavior to options

828f0c6

std.http.Client: disable zstd for now

950ce35

The Allocator requirement is problematic.

git fetching: fix redirect handling

98b39af

I mistakenly thought this was dead code in an earlier commit in this branch. This commit restores the proper behavior.

std.http.Client.Response.ParseError: remove OutOfMemory

1be7b69

This can no longer fail due to OOM.

std: convert http trailers test to unit test

f551376

making it no longer dead code. it is currently failing.

Revert "std.http: remove 'done' flag"

62ae153

This reverts commit 42be972. Using a bit to distinguish between headers and trailers is fine. It was just named and documented poorly.

std.http: fields at the top of the struct

b28f44b

Perhaps the language should enforce this.

std.http.Client.connect: case insensitive host comparison

b3af727

for checking if a proxy is connecting to itself

std.http: fix parsing incorrect tokenization

93e4099

std.http: remove format() method of Method

74fa227

I don't like this mechanism in general, and it is unused by the standard library.

std.http.Client: remove advisory file lock on fetch

640fe4c

This is not an appropriate place to put this code. It belongs in the caller's code, if at all.

std.http.Client.fetch: remove inappropriate seek

f58c708

no

andrewrk added 3 commits February 16, 2024 21:10

std.Uri: refactor std.mem.Allocator -> Allocator

804a133

andrewrk force-pushed the std.http.Server branch from a66128d to 4417ceb Compare February 17, 2024 09:33

dweiller added 3 commits February 17, 2024 21:09

std.compress.zstandard: fix buffer sizes

1ed73e2

This change corrects the size of various internal buffers used. The previous behavior did not cause validity problems but wasted space.

std.compress.zstd: fix decompressStreamOptions

c4b7223

std.compress.zstd: remove allocation from DecompressStream

a76cad2

dweiller force-pushed the zstandard-changes branch from dc5d5c9 to 99d88d9 Compare February 17, 2024 10:09

std.compress.zstd: disable failing wasm32 tests

f32b46f

This commit can be reverted after ziglang#18971 is merged.

dweiller force-pushed the zstandard-changes branch from 99d88d9 to f32b46f Compare February 17, 2024 10:11

andrewrk force-pushed the std.http.Server branch from 4417ceb to 8acb62d Compare February 17, 2024 18:52

andrewrk closed this Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Zstandard changes #18960

Zstandard changes #18960

Uh oh!

dweiller commented Feb 16, 2024 •

edited

Loading

Uh oh!

dweiller Feb 16, 2024

Uh oh!

dweiller left a comment •

edited

Loading

Uh oh!

dweiller Feb 16, 2024

Uh oh!

dweiller Feb 16, 2024

Uh oh!

Uh oh!

dweiller commented Feb 16, 2024 •

edited

Loading

Uh oh!

dweiller commented Feb 17, 2024

Uh oh!

andrewrk commented Feb 17, 2024 •

edited

Loading

Uh oh!

dweiller commented Feb 17, 2024 •

edited

Loading

Uh oh!

andrewrk commented Feb 17, 2024

Uh oh!

andrewrk commented Feb 17, 2024 •

edited

Loading

Uh oh!

dweiller commented Feb 17, 2024 •

edited

Loading

Uh oh!

andrewrk commented Feb 18, 2024

Uh oh!

dweiller commented Feb 18, 2024 •

edited

Loading

Uh oh!

andrewrk commented Feb 18, 2024

Uh oh!

Uh oh!

Uh oh!

Zstandard changes #18960

Zstandard changes #18960

Uh oh!

Conversation

dweiller commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dweiller Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

dweiller left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dweiller Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

dweiller Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dweiller commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dweiller commented Feb 17, 2024

Uh oh!

andrewrk commented Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dweiller commented Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewrk commented Feb 17, 2024

Uh oh!

andrewrk commented Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dweiller commented Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewrk commented Feb 18, 2024

Uh oh!

dweiller commented Feb 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewrk commented Feb 18, 2024

Uh oh!

Uh oh!

dweiller commented Feb 16, 2024 •

edited

Loading

dweiller left a comment •

edited

Loading

dweiller commented Feb 16, 2024 •

edited

Loading

andrewrk commented Feb 17, 2024 •

edited

Loading

dweiller commented Feb 17, 2024 •

edited

Loading

andrewrk commented Feb 17, 2024 •

edited

Loading

dweiller commented Feb 17, 2024 •

edited

Loading

dweiller commented Feb 18, 2024 •

edited

Loading