-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Investigate FxHash low-order bit quality when hashing aligned addresses. #58249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Cc @nnethercote as well, who I believe pulled FxHash from Firefox originally... |
See also #69153 |
I instrumented hashbrown's ProbeSeq with a manual Drop impl that logged the number of times we moved to a new group. If I'm understanding the code right that should correspond to the number of times we had a collision bad enough to have to go to the next group (16 buckets/group with sse2 hashbrown). When compiling libcore, that gives us these numbers:
I think it's fair to say based on this data that even if we do have low quality in the bottom bits, it's not a huge problem - we probe only once in the table (i.e., moved=0) a significant majority of the time. We also relatively rarely call eq() lots of times; I think these are the right counts. Note that some probes look for an empty bucket without checking for equality, which I think is why the total count here doesn't line up with the count above. eq_calls=0 and =1 mean we don't have any collisions in the lower + upper (h2) bits on lookup, which is the good case:
I spent some time trying to figure out if there's anything interesting about where we have longer probe lengths (e.g., correlation with bad hashes or w/e), but at least naively I'm not seeing anything all that interesting -- the hashes aren't necessarily amazing but there's no consistent patterns or other flaws that I can quickly spot. I think it's not fully implausible that we're just really unlucky when we need to probe lots of times? My sense is that there's no good case for investigating changes further based on this data collection, so I'm going to go ahead and close. |
We might be producing hashes that always have lowest N bits zeroed.
cc @Amanieu @Zoxc
The text was updated successfully, but these errors were encountered: