compress/flate: improve decompression speed #38324

klauspost · 2020-04-09T10:52:23Z

Improve inflate decompression speed, mainly through 3 optimizations:

Read further ahead on non-final blocks.

The reader guarantees that it will not read beyond the end of the stream.
This poses limitations on the decoder in terms of how far it can read ahead
and is set to the size of an end-of-block marker in f.h1.min = f.bits[endBlockMarker].

We can however take advantage of the fact that each block gives
information on whether it is the final block on a stream.
So if we are not reading the final block we can safely add the size
of the smallest block possible with nothing but an EOB marker.

That is a block with a predefined table and a single EOB marker.
Since we know the size of the block header and the encoding
of the EOB this totals to 10 additional bits.
Adding 10 bits reduces the number of stream reads significantly.

Approximately 5% throughput increase.

Manually inline f.huffSym call

This change by itself give about about 13% throughput increase.

Generate decoders for stdlib io.ByteReader types

We generate decoders for the known implementations of io.ByteReader,
namely *bytes.Buffer, *bytes.Reader, *bufio.Reader and *strings.Reader.

This change by itself gives about 20-25% throughput increase,
including when an io.Reader is passed.

I would say only *strings.Reader probably isn't that common.

Minor changes:

Reuse h.chunks and h.links.
Trade some bounds checks for AND operations.
Change chunks from uint32 to uint16.
Avoid padding of decompressor struct members.

Per loop allocation removed from benchmarks.
The numbers in the benchmark below includes this change for the 'old' numbers.

name                              old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-32        63.8µs ± 0%    41.3µs ± 0%   -35.22%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         625µs ± 0%     404µs ± 0%   -35.31%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32        6.25ms ± 0%    4.02ms ± 1%   -35.64%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32          72.1µs ± 1%    48.0µs ± 0%   -33.36%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32           792µs ± 1%     578µs ± 1%   -27.04%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32          8.09ms ± 0%    5.85ms ± 0%   -27.68%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32        74.1µs ± 1%    49.7µs ± 1%   -32.87%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32         775µs ± 1%     579µs ± 0%   -25.35%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        7.84ms ± 1%    5.84ms ± 1%   -25.59%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32    74.1µs ± 0%    49.8µs ± 0%   -32.83%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32     777µs ± 1%     579µs ± 0%   -25.47%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    7.83ms ± 1%    5.83ms ± 0%   -25.59%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32        72.9µs ± 0%    45.6µs ± 1%   -37.48%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32         712µs ± 1%     471µs ± 1%   -33.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        7.11ms ± 0%    4.70ms ± 1%   -33.98%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32          67.0µs ± 1%    45.4µs ± 1%   -32.19%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32           616µs ± 1%     447µs ± 0%   -27.49%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          6.17ms ± 0%    4.50ms ± 0%   -26.98%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32        60.7µs ± 0%    39.6µs ± 0%   -34.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32         492µs ± 0%     360µs ± 0%   -26.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        4.87ms ± 1%    3.59ms ± 0%   -26.34%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32    60.8µs ± 1%    39.6µs ± 1%   -34.92%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32     491µs ± 1%     357µs ± 1%   -27.23%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    4.84ms ± 0%    3.58ms ± 0%   -26.17%  (p=0.000 n=10+10)

name                              old speed      new speed      delta
Decode/Digits/Huffman/1e4-32       157MB/s ± 0%   242MB/s ± 0%   +54.37%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32       160MB/s ± 0%   247MB/s ± 0%   +54.58%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32       160MB/s ± 0%   249MB/s ± 1%   +55.39%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32         139MB/s ± 1%   208MB/s ± 0%   +50.06%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32         126MB/s ± 1%   173MB/s ± 1%   +37.05%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32         124MB/s ± 0%   171MB/s ± 0%   +38.28%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32       135MB/s ± 1%   201MB/s ± 1%   +48.95%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32       129MB/s ± 1%   173MB/s ± 0%   +33.95%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32       127MB/s ± 1%   171MB/s ± 1%   +34.39%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32   135MB/s ± 0%   201MB/s ± 0%   +48.88%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32   129MB/s ± 1%   173MB/s ± 0%   +34.17%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32   128MB/s ± 1%   172MB/s ± 0%   +34.39%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32       137MB/s ± 0%   219MB/s ± 1%   +59.96%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32       140MB/s ± 1%   212MB/s ± 1%   +51.32%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32       141MB/s ± 0%   213MB/s ± 1%   +51.46%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32         149MB/s ± 1%   220MB/s ± 1%   +47.48%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32         162MB/s ± 1%   224MB/s ± 0%   +37.92%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32         162MB/s ± 0%   222MB/s ± 0%   +36.95%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32       165MB/s ± 0%   253MB/s ± 0%   +53.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32       203MB/s ± 0%   278MB/s ± 0%   +36.68%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32       205MB/s ± 1%   279MB/s ± 0%   +35.77%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32   164MB/s ± 1%   253MB/s ± 1%   +53.66%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32   204MB/s ± 1%   280MB/s ± 1%   +37.41%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32   206MB/s ± 0%   280MB/s ± 0%   +35.44%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op   delta
Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         6.00B ± 0%    36.00B ± 0%  +500.00%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         64.0B ± 0%    304.0B ± 0%  +375.00%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e4-32           80.0B ± 0%     16.0B ± 0%   -80.00%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            296B ± 0%       39B ± 0%   -86.82%   (p=0.000 n=10+8)
Decode/Digits/Speed/1e6-32          3.78kB ± 0%    0.33kB ± 0%   -91.29%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32         40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          287B ± 0%       54B ± 1%   -81.04%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+9)
Decode/Digits/Compression/1e4-32     40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      288B ± 0%       55B ± 1%   -81.01%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+8)
Decode/Newton/Huffman/1e4-32          705B ± 0%       16B ± 0%   -97.73%   (p=0.000 n=9+10)
Decode/Newton/Huffman/1e5-32        4.49kB ± 0%    0.04kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        39.4kB ± 0%     0.3kB ± 0%   -99.18%   (p=0.000 n=9+10)
Decode/Newton/Speed/1e4-32            617B ± 0%       16B ± 0%   -97.41%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32          3.19kB ± 0%    0.04kB ± 0%   -98.84%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          40.5kB ± 0%     0.3kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32        2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        21.1kB ± 0%     0.3kB ± 0%   -98.80%    (p=0.000 n=8+8)
Decode/Newton/Compression/1e4-32      513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32    2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    22.9kB ± 0%     0.2kB ± 0%   -98.92%    (p=0.000 n=8+8)

name                              old allocs/op  new allocs/op  delta
Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         0.00 ±NaN%      2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         0.00 ±NaN%     16.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32            3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32            68.0 ± 0%      16.0 ± 0%   -76.47%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32          74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32      74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32          9.00 ± 0%      1.00 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32          18.0 ± 0%       2.0 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32           156 ± 0%        16 ± 0%   -89.74%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32            13.0 ± 0%       1.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32            26.0 ± 0%       2.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32             223 ± 0%        16 ± 0%   -92.83%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32          27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32           153 ± 0%        12 ± 0%   -92.16%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32      10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32      27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32       145 ± 0%        12 ± 0%   -91.72%  (p=0.000 n=10+10)

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

Improve decompression speed, mainly through 3 optimizations: 1) Take advantage of the fact that we can read further ahead when we know current block isn't the last. The reader guarantees that it will not read beyond the end of the stream. This poses limitations on the decoder in terms of how far it can read ahead and is set to the size of an end-of-block marker in `f.h1.min = f.bits[endBlockMarker]`. We can however take advantage of the fact that each block gives information on whether it is the final block on a stream. So if we are not reading the final block we can safely add the size of the smallest block possible with nothing but an EOB marker. That is a block with a predefined table and a single EOB marker. Since we know the size of the block header and the encoding of the EOB this totals to 10 additional bits. Adding 10 bits reduces the number of stream reads significantly. Approximately 5% throughput increase. 2) Manually inline f.huffSym call This change by itself give about about 13% throughput increase. 3) Generate decoders for stdlib io.ByteReader types We generate decoders for the known implementations of `io.ByteReader`, namely `*bytes.Buffer`, `*bytes.Reader`, `*bufio.Reader` and `*strings.Reader`. This change by itself gives about 20-25% throughput increase, including when an `io.Reader` is passed. I would say only `*strings.Reader` probably isn't that common. Minor changes: * Reuse `h.chunks` and `h.links`. * Trade some bounds checks for AND operations. * Change chunks from uint32 to uint16. * Avoid padding of decompressor struct members. Per loop allocation removed from benchmarks. The numbers in the benchmark below includes this change for the 'old' numbers. ``` name old time/op new time/op delta Decode/Digits/Huffman/1e4-32 78.0µs ± 0% 50.5µs ± 1% -35.26% (p=0.008 n=5+5) Decode/Digits/Huffman/1e5-32 779µs ± 2% 487µs ± 0% -37.48% (p=0.008 n=5+5) Decode/Digits/Huffman/1e6-32 7.68ms ± 0% 4.88ms ± 1% -36.44% (p=0.008 n=5+5) Decode/Digits/Speed/1e4-32 88.5µs ± 1% 59.9µs ± 1% -32.33% (p=0.008 n=5+5) Decode/Digits/Speed/1e5-32 963µs ± 1% 678µs ± 1% -29.58% (p=0.008 n=5+5) Decode/Digits/Speed/1e6-32 9.75ms ± 1% 6.90ms ± 0% -29.21% (p=0.008 n=5+5) Decode/Digits/Default/1e4-32 91.2µs ± 1% 61.4µs ± 0% -32.72% (p=0.008 n=5+5) Decode/Digits/Default/1e5-32 954µs ± 0% 675µs ± 0% -29.25% (p=0.008 n=5+5) Decode/Digits/Default/1e6-32 9.67ms ± 0% 6.79ms ± 1% -29.76% (p=0.008 n=5+5) Decode/Digits/Compression/1e4-32 90.7µs ± 1% 61.5µs ± 1% -32.21% (p=0.008 n=5+5) Decode/Digits/Compression/1e5-32 953µs ± 1% 672µs ± 0% -29.46% (p=0.016 n=4+5) Decode/Digits/Compression/1e6-32 9.76ms ± 4% 6.78ms ± 0% -30.54% (p=0.008 n=5+5) Decode/Newton/Huffman/1e4-32 90.4µs ± 0% 54.7µs ± 0% -39.52% (p=0.008 n=5+5) Decode/Newton/Huffman/1e5-32 885µs ± 0% 538µs ± 0% -39.19% (p=0.008 n=5+5) Decode/Newton/Huffman/1e6-32 8.84ms ± 0% 5.44ms ± 0% -38.46% (p=0.016 n=4+5) Decode/Newton/Speed/1e4-32 81.5µs ± 0% 55.1µs ± 1% -32.42% (p=0.016 n=4+5) Decode/Newton/Speed/1e5-32 751µs ± 4% 528µs ± 0% -29.70% (p=0.008 n=5+5) Decode/Newton/Speed/1e6-32 7.49ms ± 2% 5.32ms ± 0% -28.92% (p=0.008 n=5+5) Decode/Newton/Default/1e4-32 73.3µs ± 1% 48.9µs ± 1% -33.36% (p=0.008 n=5+5) Decode/Newton/Default/1e5-32 601µs ± 2% 418µs ± 0% -30.40% (p=0.008 n=5+5) Decode/Newton/Default/1e6-32 5.92ms ± 0% 4.17ms ± 0% -29.60% (p=0.008 n=5+5) Decode/Newton/Compression/1e4-32 72.7µs ± 0% 48.5µs ± 0% -33.21% (p=0.008 n=5+5) Decode/Newton/Compression/1e5-32 597µs ± 0% 418µs ± 0% -29.90% (p=0.008 n=5+5) Decode/Newton/Compression/1e6-32 5.90ms ± 0% 4.15ms ± 0% -29.63% (p=0.016 n=4+5) name old speed new speed delta Decode/Digits/Huffman/1e4-32 128MB/s ± 0% 198MB/s ± 1% +54.46% (p=0.008 n=5+5) Decode/Digits/Huffman/1e5-32 128MB/s ± 2% 205MB/s ± 0% +59.92% (p=0.008 n=5+5) Decode/Digits/Huffman/1e6-32 130MB/s ± 0% 205MB/s ± 1% +57.33% (p=0.008 n=5+5) Decode/Digits/Speed/1e4-32 113MB/s ± 1% 167MB/s ± 1% +47.79% (p=0.008 n=5+5) Decode/Digits/Speed/1e5-32 104MB/s ± 1% 147MB/s ± 1% +42.01% (p=0.008 n=5+5) Decode/Digits/Speed/1e6-32 103MB/s ± 1% 145MB/s ± 0% +41.26% (p=0.008 n=5+5) Decode/Digits/Default/1e4-32 110MB/s ± 1% 163MB/s ± 0% +48.63% (p=0.008 n=5+5) Decode/Digits/Default/1e5-32 105MB/s ± 0% 148MB/s ± 0% +41.34% (p=0.008 n=5+5) Decode/Digits/Default/1e6-32 103MB/s ± 0% 147MB/s ± 1% +42.37% (p=0.008 n=5+5) Decode/Digits/Compression/1e4-32 110MB/s ± 1% 163MB/s ± 1% +47.51% (p=0.008 n=5+5) Decode/Digits/Compression/1e5-32 105MB/s ± 1% 149MB/s ± 0% +41.77% (p=0.016 n=4+5) Decode/Digits/Compression/1e6-32 102MB/s ± 4% 147MB/s ± 0% +43.91% (p=0.008 n=5+5) Decode/Newton/Huffman/1e4-32 111MB/s ± 0% 183MB/s ± 0% +65.35% (p=0.008 n=5+5) Decode/Newton/Huffman/1e5-32 113MB/s ± 0% 186MB/s ± 0% +64.44% (p=0.008 n=5+5) Decode/Newton/Huffman/1e6-32 113MB/s ± 0% 184MB/s ± 0% +62.50% (p=0.016 n=4+5) Decode/Newton/Speed/1e4-32 123MB/s ± 0% 182MB/s ± 1% +47.98% (p=0.016 n=4+5) Decode/Newton/Speed/1e5-32 133MB/s ± 4% 189MB/s ± 0% +42.20% (p=0.008 n=5+5) Decode/Newton/Speed/1e6-32 134MB/s ± 2% 188MB/s ± 0% +40.67% (p=0.008 n=5+5) Decode/Newton/Default/1e4-32 136MB/s ± 1% 205MB/s ± 1% +50.05% (p=0.008 n=5+5) Decode/Newton/Default/1e5-32 166MB/s ± 2% 239MB/s ± 0% +43.67% (p=0.008 n=5+5) Decode/Newton/Default/1e6-32 169MB/s ± 0% 240MB/s ± 0% +42.04% (p=0.008 n=5+5) Decode/Newton/Compression/1e4-32 138MB/s ± 0% 206MB/s ± 0% +49.73% (p=0.008 n=5+5) Decode/Newton/Compression/1e5-32 168MB/s ± 0% 239MB/s ± 0% +42.66% (p=0.008 n=5+5) Decode/Newton/Compression/1e6-32 170MB/s ± 0% 241MB/s ± 0% +42.11% (p=0.016 n=4+5) name old alloc/op new alloc/op delta Decode/Digits/Huffman/1e4-32 0.00B ±NaN% 16.00B ± 0% +Inf% (p=0.008 n=5+5) Decode/Digits/Huffman/1e5-32 7.60B ± 8% 32.00B ± 0% +321.05% (p=0.008 n=5+5) Decode/Digits/Huffman/1e6-32 79.6B ± 1% 264.0B ± 0% +231.66% (p=0.008 n=5+5) Decode/Digits/Speed/1e4-32 80.0B ± 0% 16.0B ± 0% -80.00% (p=0.008 n=5+5) Decode/Digits/Speed/1e5-32 297B ± 0% 33B ± 0% ~ (p=0.079 n=4+5) Decode/Digits/Speed/1e6-32 3.86kB ± 0% 0.27kB ± 0% -92.98% (p=0.008 n=5+5) Decode/Digits/Default/1e4-32 48.0B ± 0% 16.0B ± 0% -66.67% (p=0.008 n=5+5) Decode/Digits/Default/1e5-32 297B ± 0% 49B ± 0% -83.50% (p=0.008 n=5+5) Decode/Digits/Default/1e6-32 4.28kB ± 0% 0.38kB ± 0% ~ (p=0.079 n=4+5) Decode/Digits/Compression/1e4-32 48.0B ± 0% 16.0B ± 0% -66.67% (p=0.008 n=5+5) Decode/Digits/Compression/1e5-32 297B ± 0% 49B ± 0% ~ (p=0.079 n=4+5) Decode/Digits/Compression/1e6-32 4.28kB ± 0% 0.38kB ± 0% -91.09% (p=0.000 n=4+5) Decode/Newton/Huffman/1e4-32 705B ± 0% 16B ± 0% -97.73% (p=0.008 n=5+5) Decode/Newton/Huffman/1e5-32 4.50kB ± 0% 0.03kB ± 0% -99.27% (p=0.008 n=5+5) Decode/Newton/Huffman/1e6-32 39.4kB ± 0% 0.3kB ± 0% -99.29% (p=0.008 n=5+5) Decode/Newton/Speed/1e4-32 625B ± 0% 16B ± 0% -97.44% (p=0.008 n=5+5) Decode/Newton/Speed/1e5-32 3.21kB ± 0% 0.03kB ± 0% -98.97% (p=0.008 n=5+5) Decode/Newton/Speed/1e6-32 40.6kB ± 0% 0.3kB ± 0% -99.25% (p=0.008 n=5+5) Decode/Newton/Default/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.008 n=5+5) Decode/Newton/Default/1e5-32 2.37kB ± 0% 0.03kB ± 0% -98.61% (p=0.008 n=5+5) Decode/Newton/Default/1e6-32 21.2kB ± 0% 0.2kB ± 0% -98.97% (p=0.008 n=5+5) Decode/Newton/Compression/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.008 n=5+5) Decode/Newton/Compression/1e5-32 2.37kB ± 0% 0.03kB ± 0% -98.61% (p=0.008 n=5+5) Decode/Newton/Compression/1e6-32 23.0kB ± 0% 0.2kB ± 0% -99.07% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Decode/Digits/Huffman/1e4-32 0.00 ±NaN% 1.00 ± 0% +Inf% (p=0.008 n=5+5) Decode/Digits/Huffman/1e5-32 0.00 ±NaN% 2.00 ± 0% +Inf% (p=0.008 n=5+5) Decode/Digits/Huffman/1e6-32 0.00 ±NaN% 16.00 ± 0% +Inf% (p=0.008 n=5+5) Decode/Digits/Speed/1e4-32 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.008 n=5+5) Decode/Digits/Speed/1e5-32 6.00 ± 0% 2.00 ± 0% -66.67% (p=0.008 n=5+5) Decode/Digits/Speed/1e6-32 68.0 ± 0% 16.0 ± 0% -76.47% (p=0.008 n=5+5) Decode/Digits/Default/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.008 n=5+5) Decode/Digits/Default/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.008 n=5+5) Decode/Digits/Default/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.008 n=5+5) Decode/Digits/Compression/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.008 n=5+5) Decode/Digits/Compression/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.008 n=5+5) Decode/Digits/Compression/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.008 n=5+5) Decode/Newton/Huffman/1e4-32 9.00 ± 0% 1.00 ± 0% -88.89% (p=0.008 n=5+5) Decode/Newton/Huffman/1e5-32 18.0 ± 0% 2.0 ± 0% -88.89% (p=0.008 n=5+5) Decode/Newton/Huffman/1e6-32 156 ± 0% 16 ± 0% -89.74% (p=0.008 n=5+5) Decode/Newton/Speed/1e4-32 13.0 ± 0% 1.0 ± 0% -92.31% (p=0.008 n=5+5) Decode/Newton/Speed/1e5-32 26.0 ± 0% 2.0 ± 0% -92.31% (p=0.008 n=5+5) Decode/Newton/Speed/1e6-32 223 ± 0% 16 ± 0% -92.83% (p=0.008 n=5+5) Decode/Newton/Default/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.008 n=5+5) Decode/Newton/Default/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.008 n=5+5) Decode/Newton/Default/1e6-32 153 ± 0% 12 ± 0% -92.16% (p=0.008 n=5+5) Decode/Newton/Compression/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.008 n=5+5) Decode/Newton/Compression/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.008 n=5+5) Decode/Newton/Compression/1e6-32 145 ± 0% 12 ± 0% -91.72% (p=0.008 n=5+5) ``` These changes have been included in https://github.com/klauspost/compress for a little more than a month now, which includes fuzz testing. Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

gopherbot · 2020-04-09T11:09:41Z

This PR (HEAD: 6180f3c) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

mertakman · 2021-05-12T14:38:31Z

why this is still not merged ?

ianlancetaylor · 2021-05-12T21:59:41Z

@volknanebo We don't use GitHub for code review. If you want to make a comment, please make it at https://golang.org/cl/227737. Thanks.

klauspost · 2021-12-16T08:49:21Z

@heschi What happened here?

heschi · 2021-12-16T18:02:02Z

I closed old PRs to reduce load on the Gerrit importer (#50197), sorry for the trouble. I'll reopen the CL and PR.

# Conflicts: # src/compress/flate/reader_test.go

gopherbot · 2022-06-07T11:22:54Z

Message from Ian Lance Taylor:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-06-07T11:22:55Z

Message from Klaus Post:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-06-07T11:23:04Z

This PR (HEAD: c00babd) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

* Inline moreBits * Put values on stack. * Also generate the fallback. Change-Id: I64d03424438ebc5dbacd4f364e3e6d3c4936a008

gopherbot · 2022-06-07T12:12:04Z

This PR (HEAD: ae9b62a) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

gopherbot · 2022-06-07T13:18:30Z

Message from Klaus Post:

Patch Set 5:

(2 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

Change-Id: If11b81d2de23a2588f3d4c7baa088ed5d614de70

gopherbot · 2022-06-10T11:09:45Z

This PR (HEAD: 161f021) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

greatroar · 2022-06-12T09:21:29Z

I would say only *strings.Reader probably isn't that common.

Syncthing uses that. It keeps compressed web assets in strings to ensure they're in the RODATA section and can decompress them for HTTP clients without gzip support.

klauspost · 2022-09-07T08:30:58Z

Ping @ianlancetaylor - if there is interest for this in 1.20 it would be good to get started on CR.

gopherbot · 2022-09-09T00:24:45Z

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-11-08T21:35:02Z

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-11-08T21:54:09Z

Message from Joseph Tsai:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-11-08T21:54:10Z

Message from Joseph Tsai:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-11-08T21:59:02Z

Message from Joseph Tsai:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:42Z

Message from Ian Lance Taylor:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:43Z

Message from Klaus Post:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:44Z

Message from Klaus Post:

Patch Set 5:

(2 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:45Z

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:46Z

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:47Z

Message from Joseph Tsai:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:48Z

Message from Joseph Tsai:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2023-09-25T15:27:49Z

Message from Joseph Tsai:

Patch Set 6:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Apr 9, 2020

klauspost mentioned this pull request May 29, 2020

cmd/compile: improve inlining cost model #17566

Open

gopherbot force-pushed the master branch from a408139 to 5764653 Compare September 14, 2020 21:26

klauspost mentioned this pull request Sep 21, 2020

compress/flate: deflatefast produces corrupted output #41420

Closed

heschi closed this Dec 15, 2021

heschi reopened this Dec 16, 2021

Merge branch 'master' into inflate-improve-speed

c00babd

# Conflicts: # src/compress/flate/reader_test.go

[klauspost/inflate-improve-speed] * Eliminate length branches

ae9b62a

* Inline moreBits * Put values on stack. * Also generate the fallback. Change-Id: I64d03424438ebc5dbacd4f364e3e6d3c4936a008

[klauspost/inflate-improve-speed] Fix up comments

161f021

Change-Id: If11b81d2de23a2588f3d4c7baa088ed5d614de70

compress/flate: improve decompression speed #38324

Are you sure you want to change the base?

compress/flate: improve decompression speed #38324

Conversation

klauspost commented Apr 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gopherbot commented Apr 9, 2020

Uh oh!

mertakman commented May 12, 2021

Uh oh!

ianlancetaylor commented May 12, 2021

Uh oh!

klauspost commented Dec 16, 2021

Uh oh!

heschi commented Dec 16, 2021

Uh oh!

gopherbot commented Jun 7, 2022

Uh oh!

gopherbot commented Jun 7, 2022

Uh oh!

gopherbot commented Jun 7, 2022

Uh oh!

gopherbot commented Jun 7, 2022

Uh oh!

gopherbot commented Jun 7, 2022

Uh oh!

gopherbot commented Jun 10, 2022

Uh oh!

greatroar commented Jun 12, 2022

Uh oh!

klauspost commented Sep 7, 2022

Uh oh!

gopherbot commented Sep 9, 2022

Uh oh!

gopherbot commented Nov 8, 2022

Uh oh!

gopherbot commented Nov 8, 2022

Uh oh!

gopherbot commented Nov 8, 2022

Uh oh!

gopherbot commented Nov 8, 2022

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

gopherbot commented Sep 25, 2023

Uh oh!

Uh oh!

klauspost commented Apr 9, 2020 •

edited

Loading