Skip to content

compress/flate: improve decompression speed #38324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

klauspost
Copy link
Contributor

@klauspost klauspost commented Apr 9, 2020

Improve inflate decompression speed, mainly through 3 optimizations:

  1. Read further ahead on non-final blocks.

The reader guarantees that it will not read beyond the end of the stream.
This poses limitations on the decoder in terms of how far it can read ahead
and is set to the size of an end-of-block marker in f.h1.min = f.bits[endBlockMarker].

We can however take advantage of the fact that each block gives
information on whether it is the final block on a stream.
So if we are not reading the final block we can safely add the size
of the smallest block possible with nothing but an EOB marker.

That is a block with a predefined table and a single EOB marker.
Since we know the size of the block header and the encoding
of the EOB this totals to 10 additional bits.
Adding 10 bits reduces the number of stream reads significantly.

Approximately 5% throughput increase.

  1. Manually inline f.huffSym call

This change by itself give about about 13% throughput increase.

  1. Generate decoders for stdlib io.ByteReader types

We generate decoders for the known implementations of io.ByteReader,
namely *bytes.Buffer, *bytes.Reader, *bufio.Reader and *strings.Reader.

This change by itself gives about 20-25% throughput increase,
including when an io.Reader is passed.

I would say only *strings.Reader probably isn't that common.

Minor changes:

  • Reuse h.chunks and h.links.
  • Trade some bounds checks for AND operations.
  • Change chunks from uint32 to uint16.
  • Avoid padding of decompressor struct members.

Per loop allocation removed from benchmarks.
The numbers in the benchmark below includes this change for the 'old' numbers.

name                              old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-32        63.8µs ± 0%    41.3µs ± 0%   -35.22%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         625µs ± 0%     404µs ± 0%   -35.31%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32        6.25ms ± 0%    4.02ms ± 1%   -35.64%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32          72.1µs ± 1%    48.0µs ± 0%   -33.36%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32           792µs ± 1%     578µs ± 1%   -27.04%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32          8.09ms ± 0%    5.85ms ± 0%   -27.68%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32        74.1µs ± 1%    49.7µs ± 1%   -32.87%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32         775µs ± 1%     579µs ± 0%   -25.35%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        7.84ms ± 1%    5.84ms ± 1%   -25.59%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32    74.1µs ± 0%    49.8µs ± 0%   -32.83%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32     777µs ± 1%     579µs ± 0%   -25.47%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    7.83ms ± 1%    5.83ms ± 0%   -25.59%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32        72.9µs ± 0%    45.6µs ± 1%   -37.48%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32         712µs ± 1%     471µs ± 1%   -33.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        7.11ms ± 0%    4.70ms ± 1%   -33.98%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32          67.0µs ± 1%    45.4µs ± 1%   -32.19%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32           616µs ± 1%     447µs ± 0%   -27.49%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          6.17ms ± 0%    4.50ms ± 0%   -26.98%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32        60.7µs ± 0%    39.6µs ± 0%   -34.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32         492µs ± 0%     360µs ± 0%   -26.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        4.87ms ± 1%    3.59ms ± 0%   -26.34%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32    60.8µs ± 1%    39.6µs ± 1%   -34.92%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32     491µs ± 1%     357µs ± 1%   -27.23%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    4.84ms ± 0%    3.58ms ± 0%   -26.17%  (p=0.000 n=10+10)

name                              old speed      new speed      delta
Decode/Digits/Huffman/1e4-32       157MB/s ± 0%   242MB/s ± 0%   +54.37%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32       160MB/s ± 0%   247MB/s ± 0%   +54.58%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32       160MB/s ± 0%   249MB/s ± 1%   +55.39%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32         139MB/s ± 1%   208MB/s ± 0%   +50.06%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32         126MB/s ± 1%   173MB/s ± 1%   +37.05%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32         124MB/s ± 0%   171MB/s ± 0%   +38.28%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32       135MB/s ± 1%   201MB/s ± 1%   +48.95%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32       129MB/s ± 1%   173MB/s ± 0%   +33.95%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32       127MB/s ± 1%   171MB/s ± 1%   +34.39%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32   135MB/s ± 0%   201MB/s ± 0%   +48.88%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32   129MB/s ± 1%   173MB/s ± 0%   +34.17%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32   128MB/s ± 1%   172MB/s ± 0%   +34.39%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32       137MB/s ± 0%   219MB/s ± 1%   +59.96%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32       140MB/s ± 1%   212MB/s ± 1%   +51.32%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32       141MB/s ± 0%   213MB/s ± 1%   +51.46%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32         149MB/s ± 1%   220MB/s ± 1%   +47.48%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32         162MB/s ± 1%   224MB/s ± 0%   +37.92%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32         162MB/s ± 0%   222MB/s ± 0%   +36.95%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32       165MB/s ± 0%   253MB/s ± 0%   +53.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32       203MB/s ± 0%   278MB/s ± 0%   +36.68%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32       205MB/s ± 1%   279MB/s ± 0%   +35.77%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32   164MB/s ± 1%   253MB/s ± 1%   +53.66%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32   204MB/s ± 1%   280MB/s ± 1%   +37.41%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32   206MB/s ± 0%   280MB/s ± 0%   +35.44%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op   delta
Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         6.00B ± 0%    36.00B ± 0%  +500.00%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         64.0B ± 0%    304.0B ± 0%  +375.00%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e4-32           80.0B ± 0%     16.0B ± 0%   -80.00%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            296B ± 0%       39B ± 0%   -86.82%   (p=0.000 n=10+8)
Decode/Digits/Speed/1e6-32          3.78kB ± 0%    0.33kB ± 0%   -91.29%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32         40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          287B ± 0%       54B ± 1%   -81.04%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+9)
Decode/Digits/Compression/1e4-32     40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      288B ± 0%       55B ± 1%   -81.01%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+8)
Decode/Newton/Huffman/1e4-32          705B ± 0%       16B ± 0%   -97.73%   (p=0.000 n=9+10)
Decode/Newton/Huffman/1e5-32        4.49kB ± 0%    0.04kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        39.4kB ± 0%     0.3kB ± 0%   -99.18%   (p=0.000 n=9+10)
Decode/Newton/Speed/1e4-32            617B ± 0%       16B ± 0%   -97.41%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32          3.19kB ± 0%    0.04kB ± 0%   -98.84%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          40.5kB ± 0%     0.3kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32        2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        21.1kB ± 0%     0.3kB ± 0%   -98.80%    (p=0.000 n=8+8)
Decode/Newton/Compression/1e4-32      513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32    2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    22.9kB ± 0%     0.2kB ± 0%   -98.92%    (p=0.000 n=8+8)

name                              old allocs/op  new allocs/op  delta
Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         0.00 ±NaN%      2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         0.00 ±NaN%     16.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32            3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32            68.0 ± 0%      16.0 ± 0%   -76.47%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32          74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32      74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32          9.00 ± 0%      1.00 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32          18.0 ± 0%       2.0 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32           156 ± 0%        16 ± 0%   -89.74%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32            13.0 ± 0%       1.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32            26.0 ± 0%       2.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32             223 ± 0%        16 ± 0%   -92.83%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32          27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32           153 ± 0%        12 ± 0%   -92.16%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32      10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32      27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32       145 ± 0%        12 ± 0%   -91.72%  (p=0.000 n=10+10)

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

Improve decompression speed, mainly through 3 optimizations:

1) Take advantage of the fact that we can read further ahead when we know current block isn't the last.

The reader guarantees that it will not read beyond the end of the stream.
This poses limitations on the decoder in terms of how far it can read ahead and is set to the size of an end-of-block marker in `f.h1.min = f.bits[endBlockMarker]`.

We can however take advantage of the fact that each block gives information on whether it is the final block on a stream. So if we are not reading the final block we can safely add the size of the smallest block possible with nothing but an EOB marker.

That is a block with a predefined table and a single EOB marker. Since we know the size of the block header and the encoding of the EOB this totals to 10 additional bits. Adding 10 bits reduces the number of stream reads significantly.

Approximately 5% throughput increase.

2) Manually inline f.huffSym call

This change by itself give about about 13% throughput increase.

3) Generate decoders for stdlib io.ByteReader types

We generate decoders for the known implementations of `io.ByteReader`, namely `*bytes.Buffer`, `*bytes.Reader`, `*bufio.Reader` and `*strings.Reader`.

This change by itself gives about 20-25% throughput increase, including when an `io.Reader` is passed.

I would say only `*strings.Reader` probably isn't that common.

Minor changes:

* Reuse `h.chunks` and `h.links`.
* Trade some bounds checks for AND operations.
* Change chunks from uint32 to uint16.
* Avoid padding of decompressor struct members.

Per loop allocation removed from benchmarks. The numbers in the benchmark below includes this change for the 'old' numbers.

```
name                              old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-32        78.0µs ± 0%    50.5µs ± 1%   -35.26%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32         779µs ± 2%     487µs ± 0%   -37.48%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32        7.68ms ± 0%    4.88ms ± 1%   -36.44%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32          88.5µs ± 1%    59.9µs ± 1%   -32.33%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32           963µs ± 1%     678µs ± 1%   -29.58%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32          9.75ms ± 1%    6.90ms ± 0%   -29.21%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32        91.2µs ± 1%    61.4µs ± 0%   -32.72%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32         954µs ± 0%     675µs ± 0%   -29.25%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32        9.67ms ± 0%    6.79ms ± 1%   -29.76%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32    90.7µs ± 1%    61.5µs ± 1%   -32.21%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32     953µs ± 1%     672µs ± 0%   -29.46%  (p=0.016 n=4+5)
Decode/Digits/Compression/1e6-32    9.76ms ± 4%    6.78ms ± 0%   -30.54%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32        90.4µs ± 0%    54.7µs ± 0%   -39.52%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32         885µs ± 0%     538µs ± 0%   -39.19%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32        8.84ms ± 0%    5.44ms ± 0%   -38.46%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e4-32          81.5µs ± 0%    55.1µs ± 1%   -32.42%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e5-32           751µs ± 4%     528µs ± 0%   -29.70%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32          7.49ms ± 2%    5.32ms ± 0%   -28.92%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32        73.3µs ± 1%    48.9µs ± 1%   -33.36%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32         601µs ± 2%     418µs ± 0%   -30.40%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32        5.92ms ± 0%    4.17ms ± 0%   -29.60%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32    72.7µs ± 0%    48.5µs ± 0%   -33.21%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32     597µs ± 0%     418µs ± 0%   -29.90%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32    5.90ms ± 0%    4.15ms ± 0%   -29.63%  (p=0.016 n=4+5)

name                              old speed      new speed      delta
Decode/Digits/Huffman/1e4-32       128MB/s ± 0%   198MB/s ± 1%   +54.46%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32       128MB/s ± 2%   205MB/s ± 0%   +59.92%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32       130MB/s ± 0%   205MB/s ± 1%   +57.33%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32         113MB/s ± 1%   167MB/s ± 1%   +47.79%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32         104MB/s ± 1%   147MB/s ± 1%   +42.01%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32         103MB/s ± 1%   145MB/s ± 0%   +41.26%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32       110MB/s ± 1%   163MB/s ± 0%   +48.63%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32       105MB/s ± 0%   148MB/s ± 0%   +41.34%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32       103MB/s ± 0%   147MB/s ± 1%   +42.37%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32   110MB/s ± 1%   163MB/s ± 1%   +47.51%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32   105MB/s ± 1%   149MB/s ± 0%   +41.77%  (p=0.016 n=4+5)
Decode/Digits/Compression/1e6-32   102MB/s ± 4%   147MB/s ± 0%   +43.91%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32       111MB/s ± 0%   183MB/s ± 0%   +65.35%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32       113MB/s ± 0%   186MB/s ± 0%   +64.44%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32       113MB/s ± 0%   184MB/s ± 0%   +62.50%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e4-32         123MB/s ± 0%   182MB/s ± 1%   +47.98%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e5-32         133MB/s ± 4%   189MB/s ± 0%   +42.20%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32         134MB/s ± 2%   188MB/s ± 0%   +40.67%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32       136MB/s ± 1%   205MB/s ± 1%   +50.05%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32       166MB/s ± 2%   239MB/s ± 0%   +43.67%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32       169MB/s ± 0%   240MB/s ± 0%   +42.04%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32   138MB/s ± 0%   206MB/s ± 0%   +49.73%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32   168MB/s ± 0%   239MB/s ± 0%   +42.66%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32   170MB/s ± 0%   241MB/s ± 0%   +42.11%  (p=0.016 n=4+5)

name                              old alloc/op   new alloc/op   delta
Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32         7.60B ± 8%    32.00B ± 0%  +321.05%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32         79.6B ± 1%    264.0B ± 0%  +231.66%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32           80.0B ± 0%     16.0B ± 0%   -80.00%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32            297B ± 0%       33B ± 0%      ~     (p=0.079 n=4+5)
Decode/Digits/Speed/1e6-32          3.86kB ± 0%    0.27kB ± 0%   -92.98%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32         48.0B ± 0%     16.0B ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32          297B ± 0%       49B ± 0%   -83.50%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32        4.28kB ± 0%    0.38kB ± 0%      ~     (p=0.079 n=4+5)
Decode/Digits/Compression/1e4-32     48.0B ± 0%     16.0B ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32      297B ± 0%       49B ± 0%      ~     (p=0.079 n=4+5)
Decode/Digits/Compression/1e6-32    4.28kB ± 0%    0.38kB ± 0%   -91.09%  (p=0.000 n=4+5)
Decode/Newton/Huffman/1e4-32          705B ± 0%       16B ± 0%   -97.73%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32        4.50kB ± 0%    0.03kB ± 0%   -99.27%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32        39.4kB ± 0%     0.3kB ± 0%   -99.29%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e4-32            625B ± 0%       16B ± 0%   -97.44%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e5-32          3.21kB ± 0%    0.03kB ± 0%   -98.97%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32          40.6kB ± 0%     0.3kB ± 0%   -99.25%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32          513B ± 0%       16B ± 0%   -96.88%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32        2.37kB ± 0%    0.03kB ± 0%   -98.61%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32        21.2kB ± 0%     0.2kB ± 0%   -98.97%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32      513B ± 0%       16B ± 0%   -96.88%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32    2.37kB ± 0%    0.03kB ± 0%   -98.61%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32    23.0kB ± 0%     0.2kB ± 0%   -99.07%  (p=0.008 n=5+5)

name                              old allocs/op  new allocs/op  delta
Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32         0.00 ±NaN%      2.00 ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32         0.00 ±NaN%     16.00 ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32            3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32            6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32            68.0 ± 0%      16.0 ± 0%   -76.47%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32          8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32          74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32      8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e6-32      74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32          9.00 ± 0%      1.00 ± 0%   -88.89%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32          18.0 ± 0%       2.0 ± 0%   -88.89%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32           156 ± 0%        16 ± 0%   -89.74%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e4-32            13.0 ± 0%       1.0 ± 0%   -92.31%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e5-32            26.0 ± 0%       2.0 ± 0%   -92.31%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32             223 ± 0%        16 ± 0%   -92.83%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32          10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32          27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32           153 ± 0%        12 ± 0%   -92.16%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32      10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32      27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32       145 ± 0%        12 ± 0%   -91.72%  (p=0.008 n=5+5)
```

These changes have been included in https://github.com/klauspost/compress for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Apr 9, 2020
@gopherbot
Copy link
Contributor

This PR (HEAD: 6180f3c) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@mertakman
Copy link

why this is still not merged ?

@ianlancetaylor
Copy link
Contributor

@volknanebo We don't use GitHub for code review. If you want to make a comment, please make it at https://golang.org/cl/227737. Thanks.

@heschi heschi closed this Dec 15, 2021
@klauspost
Copy link
Contributor Author

@heschi What happened here?

@heschi
Copy link
Contributor

heschi commented Dec 16, 2021

I closed old PRs to reduce load on the Gerrit importer (#50197), sorry for the trouble. I'll reopen the CL and PR.

@heschi heschi reopened this Dec 16, 2021
# Conflicts:
#	src/compress/flate/reader_test.go
@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Klaus Post:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

This PR (HEAD: c00babd) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

* Inline moreBits
* Put values on stack.
* Also generate the fallback.

Change-Id: I64d03424438ebc5dbacd4f364e3e6d3c4936a008
@gopherbot
Copy link
Contributor

This PR (HEAD: ae9b62a) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Klaus Post:

Patch Set 5:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

Change-Id: If11b81d2de23a2588f3d4c7baa088ed5d614de70
@gopherbot
Copy link
Contributor

This PR (HEAD: 161f021) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/227737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@greatroar
Copy link

I would say only *strings.Reader probably isn't that common.

Syncthing uses that. It keeps compressed web assets in strings to ensure they're in the RODATA section and can decompress them for HTTP clients without gzip support.

@klauspost
Copy link
Contributor Author

Ping @ianlancetaylor - if there is interest for this in 1.20 it would be good to get started on CR.

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Joseph Tsai:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Joseph Tsai:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Joseph Tsai:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Klaus Post:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Klaus Post:

Patch Set 5:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Joseph Tsai:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Joseph Tsai:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Joseph Tsai:

Patch Set 6:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/227737.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants