Skip to content

compress.zlib: don't overshoot underlying reader #19163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 8, 2024

Conversation

ianic
Copy link
Contributor

@ianic ianic commented Mar 3, 2024

Inflate is internally using 8 byte bit buffer. We don't know in advance how many
bytes is compressed data block long. Once we reach end of block there may be
some bytes left in that internal buffer. That result that we move underlying
reader few bytes ahead of end of compressed data block. That is problem when
compressed block is followed by another data which need to be consumed.

We run into this problem in zig package manager when unpacking git
packfile and than again in zigimg project.

This PR ensures that zlib decompressor never overshoots. Zlib is now using 4
byte internal buffer. At the end of deflate stream we can overshoot for at most
4 bytes. After deflate stream, in zlib, there is 4 byte checksum. After reading
that checksum we are now sure that there is no overshoot.

This PR also ensures that there is no overshoot on gzip decompression. Gzip uses
8 byte bit buffer and has 8 byte footer, but previous implementation was not
careful about overshooting.

Using smaller buffer means more fillings of that buffer, means lower performance.
I compared this and previous implementation: on x86 Linux there is no
significant performance difference, on arm Linux there is ~20% slowdown.

We can provide configuration option to the decompressor so it can use 8 or 4
byte bit buffer and left the user decide what to use. But for now I decide not
to introduce that interface change.

Benchmarks of the two implementations on few sample data.

Linux x86:

Benchmark 1: std
  Time (mean ± σ):     505.4 ms ±   2.9 ms    [User: 501.7 ms, System: 3.6 ms]
  Range (min … max):   502.8 ms … 513.0 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     506.8 ms ±   3.8 ms    [User: 502.9 ms, System: 3.8 ms]
  Range (min … max):   504.4 ms … 514.1 ms    10 runs

Summary
  std ran
    1.00 ± 0.01 times faster than my

Benchmark 1: std
  Time (mean ± σ):      24.8 ms ±   2.7 ms    [User: 24.2 ms, System: 0.6 ms]
  Range (min … max):    21.8 ms …  32.0 ms    125 runs

Benchmark 2: my
  Time (mean ± σ):      24.9 ms ±   1.9 ms    [User: 24.5 ms, System: 0.5 ms]
  Range (min … max):    21.9 ms …  31.2 ms    130 runs

Summary
  std ran
    1.01 ± 0.13 times faster than my

Benchmark 1: std
  Time (mean ± σ):      61.9 ms ±   4.1 ms    [User: 61.0 ms, System: 0.8 ms]
  Range (min … max):    58.2 ms …  69.0 ms    49 runs

Benchmark 2: my
  Time (mean ± σ):      62.4 ms ±   4.3 ms    [User: 61.2 ms, System: 1.1 ms]
  Range (min … max):    58.6 ms …  68.9 ms    49 runs

Summary
  std ran
    1.01 ± 0.10 times faster than my

Benchmark 1: std
  Time (mean ± σ):      16.4 ms ±   2.0 ms    [User: 16.0 ms, System: 0.5 ms]
  Range (min … max):    14.3 ms …  22.3 ms    162 runs

Benchmark 2: my
  Time (mean ± σ):      17.5 ms ±   1.7 ms    [User: 17.2 ms, System: 0.4 ms]
  Range (min … max):    14.3 ms …  22.4 ms    137 runs

Summary
  std ran
    1.07 ± 0.17 times faster than my

Benchmark 1: std
  Time (mean ± σ):     783.7 ms ±   1.1 ms    [User: 783.6 ms, System: 0.0 ms]
  Range (min … max):   782.9 ms … 786.5 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     791.0 ms ±   4.3 ms    [User: 789.6 ms, System: 1.3 ms]
  Range (min … max):   782.8 ms … 794.3 ms    10 runs

Summary
  std ran
    1.01 ± 0.01 times faster than my

Benchmark 1: std
  Time (mean ± σ):      20.8 ms ±   1.8 ms    [User: 20.5 ms, System: 0.4 ms]
  Range (min … max):    17.5 ms …  26.2 ms    161 runs

  Benchmark 2: my
  Time (mean ± σ):      19.9 ms ±   2.6 ms    [User: 19.5 ms, System: 0.5 ms]
  Range (min … max):    17.4 ms …  27.1 ms    159 runs

Summary
  my ran
    1.04 ± 0.16 times faster than std

Linux arm:

Benchmark 1: std
  Time (mean ± σ):     343.8 ms ±   1.1 ms    [User: 342.1 ms, System: 1.6 ms]
  Range (min … max):   342.1 ms … 345.5 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     410.8 ms ±   3.6 ms    [User: 406.6 ms, System: 4.1 ms]
  Range (min … max):   407.3 ms … 420.1 ms    10 runs

Summary
  std ran
    1.19 ± 0.01 times faster than my

Benchmark 1: std
  Time (mean ± σ):      13.7 ms ±   0.3 ms    [User: 13.6 ms, System: 0.1 ms]
  Range (min … max):    13.0 ms …  16.6 ms    214 runs

Benchmark 2: my
  Time (mean ± σ):      16.7 ms ±   0.3 ms    [User: 16.5 ms, System: 0.1 ms]
  Range (min … max):    15.7 ms …  17.6 ms    174 runs

Summary
  std ran
    1.22 ± 0.04 times faster than my

Benchmark 1: std
  Time (mean ± σ):      35.9 ms ±   1.0 ms    [User: 35.5 ms, System: 0.4 ms]
  Range (min … max):    34.7 ms …  44.2 ms    84 runs

Benchmark 2: my
  Time (mean ± σ):      44.1 ms ±   0.4 ms    [User: 43.9 ms, System: 0.2 ms]
  Range (min … max):    42.3 ms …  45.1 ms    70 runs

Summary
  std ran
    1.23 ± 0.04 times faster than my

Benchmark 1: std
  Time (mean ± σ):       8.8 ms ±   0.3 ms    [User: 8.6 ms, System: 0.2 ms]
  Range (min … max):     8.3 ms …  10.9 ms    331 runs

Benchmark 2: my
  Time (mean ± σ):      11.0 ms ±   0.3 ms    [User: 10.8 ms, System: 0.2 ms]
  Range (min … max):    10.4 ms …  11.9 ms    260 runs

Summary
  std ran
    1.25 ± 0.05 times faster than my

Benchmark 1: std
  Time (mean ± σ):     597.5 ms ±   1.2 ms    [User: 597.3 ms, System: 0.0 ms]
  Range (min … max):   595.4 ms … 599.3 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     604.1 ms ±   1.7 ms    [User: 604.0 ms, System: 0.1 ms]
  Range (min … max):   601.9 ms … 606.8 ms    10 runs

Summary
  std ran
    1.01 ± 0.00 times faster than my

Benchmark 1: std
  Time (mean ± σ):      12.7 ms ±   0.3 ms    [User: 12.5 ms, System: 0.1 ms]
  Range (min … max):    12.0 ms …  13.7 ms    231 runs

Benchmark 2: my
  Time (mean ± σ):      13.9 ms ±   0.3 ms    [User: 13.8 ms, System: 0.1 ms]
  Range (min … max):    13.3 ms …  14.7 ms    207 runs

Summary
  std ran
    1.10 ± 0.03 times faster than my

ianic added 5 commits March 4, 2024 09:53
Extend BitReader to accept size of internal buffer. It can be u64 (only
option until now) or u32.
fill(0) will fill all bytes in bit reader. If bit reader is aligned to
the byte, as it is at the end of the stream this ensures no overshoot
when reading footer. Footer is 4 bytes (zlib) or 8 bytes (gzip). For
zlib we will use 4 bytes BitReader and 8 for gzip. After align and fill
we will read those bytes and leave BitReader empty after that.
That ensures no bytes are left in the BitReader buffer after we reach
end of the stream.
Using example from [zigimg](zigimg/zigimg#164) project.
@ianic ianic force-pushed the zlib_no_lookahead branch from 9dd9265 to a06a305 Compare March 4, 2024 08:53
@andrewrk andrewrk merged commit 83e578a into ziglang:master Mar 8, 2024
ianic added a commit to ianic/zig that referenced this pull request Mar 11, 2024
My first zlib implementation broke git fetch because it introduce
[lookahead](ziglang#18967). That resulted
in workarounds [1](ziglang@80f3ef6)
[2](ziglang@d00faa2)

After [fixing](ziglang#19163) lookahead in
zlib decompressor this fixes are no longer necessary.
andrewrk pushed a commit that referenced this pull request Mar 11, 2024
My first zlib implementation broke git fetch because it introduce
[lookahead](#18967). That resulted
in workarounds [1](80f3ef6)
[2](d00faa2)

After [fixing](#19163) lookahead in
zlib decompressor this fixes are no longer necessary.
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this pull request Mar 20, 2024
My first zlib implementation broke git fetch because it introduce
[lookahead](ziglang#18967). That resulted
in workarounds [1](ziglang@80f3ef6)
[2](ziglang@d00faa2)

After [fixing](ziglang#19163) lookahead in
zlib decompressor this fixes are no longer necessary.
@ianic ianic deleted the zlib_no_lookahead branch April 3, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants