Skip to content

improve performance and memory usage in HashUtils.djb2 #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

skimbrel-figma
Copy link
Contributor

Runtime profiling indicates this method generates several many memory allocations.

Comparing to the JS implementation, we saw the intent of the hash &= hash line was to force the JS runtime to keep the number as a 32-bit integer. This is indeed the correct way to do it in JS, but not in Ruby; as a result, the hash local will grow ever larger, requiring more and more memory since Ruby supports unbounded integers.

Fix: truncate the hash value on each iteration with the same 32-bit 0xFFFFFFF constant used at the end instead.

Runtime profiling indicates this method generates several many memory allocations.

Comparing to the JS implementation, we saw the intent of the `hash &= hash` line was to force the JS runtime to keep the number as a 32-bit integer. This is indeed the correct way to do it in JS, but not in Ruby; as a result, the `hash` local will grow ever larger, requiring more and more memory since Ruby supports unbounded integers.

Fix: truncate the hash value on each iteration with the same 32-bit `0xFFFFFFF` constant used at the end instead.
@skimbrel-figma
Copy link
Contributor Author

Benchmarking before/after with a 256-char string on my laptop:

irb(main):199:0> puts Benchmark.measure { n.times { djb2_fixed(short_str) }}
  0.003365   0.000043   0.003408 (  0.003415)
=> nil                                                                                       
irb(main):200:0> puts Benchmark.measure { n.times { djb2(short_str) }}
  0.024926   0.001164   0.026090 (  0.026234)
=> nil    

A quick, imprecise comparison by inspecting GC.stat[:total_allocated_objects] before/after also shows 60% reduction in memory allocations.

I spot-checked a handful of input values to ensure the output hash value did not change.

@tore-statsig
Copy link
Contributor

Nice find! Pulling this in to run tests on it now

@skimbrel-figma
Copy link
Contributor Author

@tore-statsig actually i just realized we can do even better — the only thing inside the each loop is .ord, which is available as each_codepoint! i'll update.

statsig-kong bot pushed a commit that referenced this pull request May 14, 2025
#28

"""
Runtime profiling indicates this method generates several many memory
allocations.

Comparing to the JS implementation, we saw the intent of the `hash &=
hash` line was to force the JS runtime to keep the number as a 32-bit
integer. This is indeed the correct way to do it in JS, but not in Ruby;
as a result, the `hash` local will grow ever larger, requiring more and
more memory since Ruby supports unbounded integers.

Fix: truncate the hash value on each iteration with the same 32-bit
`0xFFFFFFF` constant used at the end instead.
"""

Co-authored-by: Sam Kimbrel <[email protected]>
@tore-statsig
Copy link
Contributor

The original version of this is released

https://github.com/statsig-io/ruby-sdk/releases/tag/2.4.2

@skimbrel-figma
Copy link
Contributor Author

great! we've been running the second commit (switching to each_codepoint) as a monkeypatch for most of a week now and it seems totally fine + reduces memory still further 👍

statsig-kong bot pushed a commit that referenced this pull request May 22, 2025
the only thing inside the each loop is .ord, which is available as
each_codepoint

#28

---------

Co-authored-by: Sam Kimbrel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants