compress.zlib: don't overshoot underlying reader #19163

ianic · 2024-03-03T15:52:52Z

Inflate is internally using 8 byte bit buffer. We don't know in advance how many
bytes is compressed data block long. Once we reach end of block there may be
some bytes left in that internal buffer. That result that we move underlying
reader few bytes ahead of end of compressed data block. That is problem when
compressed block is followed by another data which need to be consumed.

We run into this problem in zig package manager when unpacking git
packfile and than again in zigimg project.

This PR ensures that zlib decompressor never overshoots. Zlib is now using 4
byte internal buffer. At the end of deflate stream we can overshoot for at most
4 bytes. After deflate stream, in zlib, there is 4 byte checksum. After reading
that checksum we are now sure that there is no overshoot.

This PR also ensures that there is no overshoot on gzip decompression. Gzip uses
8 byte bit buffer and has 8 byte footer, but previous implementation was not
careful about overshooting.

Using smaller buffer means more fillings of that buffer, means lower performance.
I compared this and previous implementation: on x86 Linux there is no
significant performance difference, on arm Linux there is ~20% slowdown.

We can provide configuration option to the decompressor so it can use 8 or 4
byte bit buffer and left the user decide what to use. But for now I decide not
to introduce that interface change.

Benchmarks of the two implementations on few sample data.

Linux x86:

Benchmark 1: std
  Time (mean ± σ):     505.4 ms ±   2.9 ms    [User: 501.7 ms, System: 3.6 ms]
  Range (min … max):   502.8 ms … 513.0 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     506.8 ms ±   3.8 ms    [User: 502.9 ms, System: 3.8 ms]
  Range (min … max):   504.4 ms … 514.1 ms    10 runs

Summary
  std ran
    1.00 ± 0.01 times faster than my

Benchmark 1: std
  Time (mean ± σ):      24.8 ms ±   2.7 ms    [User: 24.2 ms, System: 0.6 ms]
  Range (min … max):    21.8 ms …  32.0 ms    125 runs

Benchmark 2: my
  Time (mean ± σ):      24.9 ms ±   1.9 ms    [User: 24.5 ms, System: 0.5 ms]
  Range (min … max):    21.9 ms …  31.2 ms    130 runs

Summary
  std ran
    1.01 ± 0.13 times faster than my

Benchmark 1: std
  Time (mean ± σ):      61.9 ms ±   4.1 ms    [User: 61.0 ms, System: 0.8 ms]
  Range (min … max):    58.2 ms …  69.0 ms    49 runs

Benchmark 2: my
  Time (mean ± σ):      62.4 ms ±   4.3 ms    [User: 61.2 ms, System: 1.1 ms]
  Range (min … max):    58.6 ms …  68.9 ms    49 runs

Summary
  std ran
    1.01 ± 0.10 times faster than my

Benchmark 1: std
  Time (mean ± σ):      16.4 ms ±   2.0 ms    [User: 16.0 ms, System: 0.5 ms]
  Range (min … max):    14.3 ms …  22.3 ms    162 runs

Benchmark 2: my
  Time (mean ± σ):      17.5 ms ±   1.7 ms    [User: 17.2 ms, System: 0.4 ms]
  Range (min … max):    14.3 ms …  22.4 ms    137 runs

Summary
  std ran
    1.07 ± 0.17 times faster than my

Benchmark 1: std
  Time (mean ± σ):     783.7 ms ±   1.1 ms    [User: 783.6 ms, System: 0.0 ms]
  Range (min … max):   782.9 ms … 786.5 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     791.0 ms ±   4.3 ms    [User: 789.6 ms, System: 1.3 ms]
  Range (min … max):   782.8 ms … 794.3 ms    10 runs

Summary
  std ran
    1.01 ± 0.01 times faster than my

Benchmark 1: std
  Time (mean ± σ):      20.8 ms ±   1.8 ms    [User: 20.5 ms, System: 0.4 ms]
  Range (min … max):    17.5 ms …  26.2 ms    161 runs

  Benchmark 2: my
  Time (mean ± σ):      19.9 ms ±   2.6 ms    [User: 19.5 ms, System: 0.5 ms]
  Range (min … max):    17.4 ms …  27.1 ms    159 runs

Summary
  my ran
    1.04 ± 0.16 times faster than std

Linux arm:

Benchmark 1: std
  Time (mean ± σ):     343.8 ms ±   1.1 ms    [User: 342.1 ms, System: 1.6 ms]
  Range (min … max):   342.1 ms … 345.5 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     410.8 ms ±   3.6 ms    [User: 406.6 ms, System: 4.1 ms]
  Range (min … max):   407.3 ms … 420.1 ms    10 runs

Summary
  std ran
    1.19 ± 0.01 times faster than my

Benchmark 1: std
  Time (mean ± σ):      13.7 ms ±   0.3 ms    [User: 13.6 ms, System: 0.1 ms]
  Range (min … max):    13.0 ms …  16.6 ms    214 runs

Benchmark 2: my
  Time (mean ± σ):      16.7 ms ±   0.3 ms    [User: 16.5 ms, System: 0.1 ms]
  Range (min … max):    15.7 ms …  17.6 ms    174 runs

Summary
  std ran
    1.22 ± 0.04 times faster than my

Benchmark 1: std
  Time (mean ± σ):      35.9 ms ±   1.0 ms    [User: 35.5 ms, System: 0.4 ms]
  Range (min … max):    34.7 ms …  44.2 ms    84 runs

Benchmark 2: my
  Time (mean ± σ):      44.1 ms ±   0.4 ms    [User: 43.9 ms, System: 0.2 ms]
  Range (min … max):    42.3 ms …  45.1 ms    70 runs

Summary
  std ran
    1.23 ± 0.04 times faster than my

Benchmark 1: std
  Time (mean ± σ):       8.8 ms ±   0.3 ms    [User: 8.6 ms, System: 0.2 ms]
  Range (min … max):     8.3 ms …  10.9 ms    331 runs

Benchmark 2: my
  Time (mean ± σ):      11.0 ms ±   0.3 ms    [User: 10.8 ms, System: 0.2 ms]
  Range (min … max):    10.4 ms …  11.9 ms    260 runs

Summary
  std ran
    1.25 ± 0.05 times faster than my

Benchmark 1: std
  Time (mean ± σ):     597.5 ms ±   1.2 ms    [User: 597.3 ms, System: 0.0 ms]
  Range (min … max):   595.4 ms … 599.3 ms    10 runs

Benchmark 2: my
  Time (mean ± σ):     604.1 ms ±   1.7 ms    [User: 604.0 ms, System: 0.1 ms]
  Range (min … max):   601.9 ms … 606.8 ms    10 runs

Summary
  std ran
    1.01 ± 0.00 times faster than my

Benchmark 1: std
  Time (mean ± σ):      12.7 ms ±   0.3 ms    [User: 12.5 ms, System: 0.1 ms]
  Range (min … max):    12.0 ms …  13.7 ms    231 runs

Benchmark 2: my
  Time (mean ± σ):      13.9 ms ±   0.3 ms    [User: 13.8 ms, System: 0.1 ms]
  Range (min … max):    13.3 ms …  14.7 ms    207 runs

Summary
  std ran
    1.10 ± 0.03 times faster than my

Extend BitReader to accept size of internal buffer. It can be u64 (only option until now) or u32.

fill(0) will fill all bytes in bit reader. If bit reader is aligned to the byte, as it is at the end of the stream this ensures no overshoot when reading footer. Footer is 4 bytes (zlib) or 8 bytes (gzip). For zlib we will use 4 bytes BitReader and 8 for gzip. After align and fill we will read those bytes and leave BitReader empty after that.

That ensures no bytes are left in the BitReader buffer after we reach end of the stream.

Using example from [zigimg](zigimg/zigimg#164) project.

My first zlib implementation broke git fetch because it introduce [lookahead](ziglang#18967). That resulted in workarounds [1](ziglang@80f3ef6) [2](ziglang@d00faa2) After [fixing](ziglang#19163) lookahead in zlib decompressor this fixes are no longer necessary.

My first zlib implementation broke git fetch because it introduce [lookahead](#18967). That resulted in workarounds [1](80f3ef6) [2](d00faa2) After [fixing](#19163) lookahead in zlib decompressor this fixes are no longer necessary.

My first zlib implementation broke git fetch because it introduce [lookahead](ziglang#18967). That resulted in workarounds [1](ziglang@80f3ef6) [2](ziglang@d00faa2) After [fixing](ziglang#19163) lookahead in zlib decompressor this fixes are no longer necessary.

ianic added 5 commits March 4, 2024 09:53

flate: 32 bit BitReader

8a963fd

Extend BitReader to accept size of internal buffer. It can be u64 (only option until now) or u32.

flate: use 4 bytes lookahead for zlib

f2508ab

That ensures no bytes are left in the BitReader buffer after we reach end of the stream.

compress.zlib: add overshoot test cast

c680b5d

Using example from [zigimg](zigimg/zigimg#164) project.

zlib: fix missing comptime attribute

a06a305

ianic force-pushed the zlib_no_lookahead branch from 9dd9265 to a06a305 Compare March 4, 2024 08:53

andrewrk merged commit 83e578a into ziglang:master Mar 8, 2024

ianic mentioned this pull request Mar 11, 2024

package: remove git fetch zlib lookahead fix #19253

Merged

ianic deleted the zlib_no_lookahead branch April 3, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress.zlib: don't overshoot underlying reader #19163

compress.zlib: don't overshoot underlying reader #19163

ianic commented Mar 3, 2024

compress.zlib: don't overshoot underlying reader #19163

compress.zlib: don't overshoot underlying reader #19163

Conversation

ianic commented Mar 3, 2024