GODRIVER-2677 Improve memory pooling. #1157

qingyang-hu · 2023-01-07T02:10:19Z

Summary

Reduce high memory consumption introduced by GODRIVER-2021.

Background & Motivation

I limited 512 slices in the pool. Get() allocates directly from the system when it reaches the limit. So the size of the pool shall not be bigger than 16MB * 512 ~= 8GB in theory considering the ticket indicates ~20GB of memory consumption.
~~I did not reset the capacity of returned byte slice because it would cause more allocations.~~

2nd attempt:

Stop pooling on reading;
Don't recycle low-occupied byte slices;
Update the benchmark tests to use a mix of short and long documents.

New benchstat from the new benchmark/operation_test.go:

name                           old time/op    new time/op    delta
ClientWrite/not_compressed-10    36.3µs ± 3%    38.4µs ±13%     ~     (p=0.218 n=10+10)
ClientWrite/snappy-10            40.0µs ± 3%    39.6µs ± 2%     ~     (p=0.279 n=8+8)
ClientWrite/zlib-10               134µs ± 4%     128µs ± 5%   -3.98%  (p=0.002 n=10+10)
ClientWrite/zstd-10              48.2µs ± 3%    51.8µs ± 9%   +7.33%  (p=0.001 n=9+10)

name                           old alloc/op   new alloc/op   delta
ClientWrite/not_compressed-10    14.0kB ± 1%    17.3kB ± 1%  +23.76%  (p=0.000 n=10+9)
ClientWrite/snappy-10            25.9kB ± 0%    23.1kB ± 1%  -10.74%  (p=0.000 n=9+10)
ClientWrite/zlib-10               884kB ± 0%     884kB ± 0%     ~     (p=0.739 n=10+10)
ClientWrite/zstd-10              36.7kB ± 0%    32.9kB ± 1%  -10.18%  (p=0.000 n=10+10)

name                           old allocs/op  new allocs/op  delta
ClientWrite/not_compressed-10      57.0 ± 0%      57.0 ± 0%     ~     (all equal)
ClientWrite/snappy-10              65.0 ± 0%      60.0 ± 0%   -7.69%  (p=0.000 n=10+10)
ClientWrite/zlib-10                 102 ± 1%        97 ± 0%   -4.53%  (p=0.000 n=10+10)
ClientWrite/zstd-10                 153 ± 0%       148 ± 0%   -3.27%  (p=0.000 n=9+9)

name                          old time/op    new time/op    delta
ClientRead/not_compressed-10    29.5µs ± 3%    33.8µs ±15%  +14.42%  (p=0.001 n=9+10)
ClientRead/snappy-10            32.9µs ± 1%    36.6µs ±17%  +11.35%  (p=0.016 n=8+10)
ClientRead/zlib-10               141µs ± 3%     138µs ± 4%     ~     (p=0.077 n=9+9)
ClientRead/zstd-10              42.9µs ± 7%    46.3µs ±13%   +7.88%  (p=0.040 n=9+9)

name                          old alloc/op   new alloc/op   delta
ClientRead/not_compressed-10    16.6kB ± 1%    17.3kB ± 1%   +4.33%  (p=0.000 n=10+10)
ClientRead/snappy-10            24.9kB ± 0%    25.3kB ± 0%   +1.64%  (p=0.000 n=10+10)
ClientRead/zlib-10               884kB ± 0%     881kB ± 0%   -0.28%  (p=0.000 n=10+10)
ClientRead/zstd-10              82.9kB ± 0%    84.1kB ± 1%   +1.38%  (p=0.000 n=10+10)

name                          old allocs/op  new allocs/op  delta
ClientRead/not_compressed-10      73.0 ± 0%      73.0 ± 0%     ~     (all equal)
ClientRead/snappy-10              78.0 ± 0%      76.0 ± 0%   -2.56%  (p=0.000 n=10+9)
ClientRead/zlib-10                 122 ± 0%       118 ± 0%   -3.28%  (p=0.000 n=8+7)
ClientRead/zstd-10                 181 ± 0%       178 ± 0%   -1.66%  (p=0.002 n=8+10)

Notes:

The running time varies among benchmarks, which may be impacted by the system load. However, the memory results are consistent.
The increase of ClientWrite/not_compressed-10 can be eliminated if I recycle the slices regardless of their occupation. However, I decided to adopt a less aggressive policy of pooling highly-occupied slices only to avoid holding slices with large sizes.

benjirewis

Nice, simple design. Given benchmark results, seems like a ~~solid change to me~~ 🧑‍🔧 . Just a few questions.

Need to look closer at the benchmarks, seems more nuanced after discussing offline with @qingyang-hu .

benjirewis · 2023-01-10T15:49:14Z