Skip to content

Conversation

cijothomas
Copy link
Member

@cijothomas cijothomas commented May 25, 2024

Part 1 of #1740

Modifies Sum aggregation (used by Counter/UpDownCounters), to have less contention by using RwLock instead of Mutex to access HashMap of values. Updates (the hot path) now only need read() lock as it leverages interior mutability to update the underlying value. This effectively makes the HashMap read-heavy, and only need read() locks in hot path, significantly reducing contention, and thereby boosting throughput.

Perf numbers from metrics stress test confirms the above: we jump from 3 M/sec to 35 M/sec, i.e 10X jump!
Criterion based benchmarks uses single thread and hence, won't show contention. They are not expected to change at all with the changes in this PR - and results show no change.

This PR focused on throughput only, and the next set of PRs (which require more refactoring), will boost the benchmarks as well significantly.

@cijothomas cijothomas requested a review from a team May 25, 2024 17:49
Copy link

codecov bot commented May 25, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 74.0%. Comparing base (6ee5579) to head (a33461c).

Files Patch % Lines
opentelemetry-sdk/src/metrics/internal/sum.rs 93.3% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1833   +/-   ##
=====================================
  Coverage   74.0%   74.0%           
=====================================
  Files        122     122           
  Lines      19570   19577    +7     
=====================================
+ Hits       14493   14499    +6     
- Misses      5077    5078    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lalitb
Copy link
Member

lalitb commented May 25, 2024

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

@cijothomas
Copy link
Member Author

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564.

@cijothomas
Copy link
Member Author

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564.

Correction. Sharding still helps updates() compete less with other updates() that need to insert a new KVP combination.

@cijothomas cijothomas merged commit 0f6de5a into open-telemetry:main May 29, 2024
@cijothomas cijothomas deleted the cijothomas/metrics branch May 29, 2024 01:26
bantonsson pushed a commit to bantonsson/opentelemetry-rust that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants