Skip to content

Conversation

pracucci
Copy link
Contributor

What this PR does:
I'm profiling the ingester v2Push() to investigate why we see increased CPU and memory on an high number of errors occurring in the write path (eg. out of order samples, out of bound samples, per-user/metric series limit reached). I've improved the benchmark we already had and I've found out several inefficiencies.

This is the first PR to fix a single issue: metrics tracking. I will follow-up with dedicated PRs for other improvements.

Benchmark:

name                                                                                             old time/op    new time/op    delta
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  435µs ± 1%     297µs ± 3%  -31.65%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                12.6ms ± 5%     8.1ms ±15%  -35.29%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                144ms ±14%      81ms ±12%  -44.10%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                  466µs ± 4%     300µs ± 2%  -35.72%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                12.0ms ± 4%     6.6ms ±15%  -45.38%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12                134ms ± 8%      73ms ± 5%  -45.54%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12       39.1ms ± 2%    31.1ms ± 6%  -20.48%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       393ms ± 7%     359ms ±10%   -8.75%  (p=0.032 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12        1.40ms ± 9%    1.22ms ± 3%  -13.02%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12      1.53ms ±13%    1.48ms ± 9%     ~     (p=0.310 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12      107ms ± 4%     105ms ±11%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     1.15s ± 7%     1.04s ± 7%   -9.40%  (p=0.032 n=5+5)

name                                                                                             old alloc/op   new alloc/op   delta
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  131kB ± 0%      99kB ± 0%  -24.54%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                13.9MB ± 1%    10.4MB ± 0%  -25.30%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                198MB ±24%     141MB ±13%  -28.54%  (p=0.032 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                 34.4kB ± 0%     2.1kB ± 1%  -93.79%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                3.90MB ± 4%    0.35MB ±27%  -91.00%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12               96.5MB ±47%    43.0MB ± 2%  -55.42%  (p=0.016 n=5+4)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12       71.1MB ± 0%    67.6MB ± 0%   -4.96%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       841MB ±17%     824MB ±16%     ~     (p=1.000 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12         687kB ± 0%     655kB ± 0%   -4.71%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12       688kB ± 0%     656kB ± 0%   -4.71%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12     71.2MB ± 7%    67.6MB ± 6%     ~     (p=0.056 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12    1.09GB ±38%    0.68GB ± 2%  -37.50%  (p=0.016 n=5+4)

name                                                                                             old allocs/op  new allocs/op  delta
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_no_concurrency-12                  3.04k ± 0%     2.04k ± 0%  -32.89%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_low_concurrency-12                  307k ± 0%      206k ± 0%  -32.91%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_bound_samples,_scenario:_high_concurrency-12                3.27M ± 5%     2.18M ± 3%  -33.12%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_no_concurrency-12                  1.04k ± 0%     0.04k ± 0%  -96.15%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_low_concurrency-12                  105k ± 1%        4k ± 8%  -95.82%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_out_of_order_samples,_scenario:_high_concurrency-12                1.24M ±12%     0.17M ± 3%  -85.95%  (p=0.016 n=5+4)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_low_concurrency-12        1.01M ± 0%     0.91M ± 0%   -9.97%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_high_concurrency-12       10.5M ± 5%      9.6M ± 5%   -8.92%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-user_series_limit_reached,_scenario:_no_concurrency-12         10.0k ± 0%      9.0k ± 0%   -9.96%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_no_concurrency-12       10.0k ± 0%      9.0k ± 0%   -9.95%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_low_concurrency-12      1.01M ± 2%     0.91M ± 2%   -9.97%  (p=0.008 n=5+5)
_Ingester_v2PushOnError/failure:_per-metric_series_limit_reached,_scenario:_high_concurrency-12     11.3M ±13%      9.0M ± 0%  -20.14%  (p=0.016 n=5+4)

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <[email protected]>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

@pracucci pracucci merged commit 22f6690 into cortexproject:master Mar 18, 2021
@pracucci pracucci deleted the benchmark-ingester-on-errors branch March 18, 2021 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants