Skip to content

Implement the 64-bit variant of xxHash. #17656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Implement the 64-bit variant of xxHash. #17656

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 30, 2014

Matches C in speed, but it looks like SipHash was not the only bottleneck for #11783.

Tests include the official data and the relevant ones from SipHash.

@rust-highfive
Copy link
Contributor

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@thestinger
Copy link
Contributor

For security, I'd say it could replace SipHash wholesale, as per this post.

That post doesn't go into any in-depth cryptanalysis. Security against DoS attacks requires a lot more than just good results on statistical tests and avalanche effect resistance.

@ghost
Copy link
Author

ghost commented Sep 30, 2014

Oh well. At least it's fast :)

@thestinger
Copy link
Contributor

Can this replace the FNV implementation that's currently used in rustc?

@ghost
Copy link
Author

ghost commented Sep 30, 2014

@thestinger: If used with large enough chunks (>32b), it can beat anything. With small chunks, it degrades the same way SipHash does, but I think that's inevitable with a decent hash.

state.digest()
}

pub struct XXState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the convention is XxState

@pczarn
Copy link
Contributor

pczarn commented Sep 30, 2014

This implementation has quite a bit of unsafe code. SipHash's doesn't have any.

Have you checked the C implementation's speed? I think it should be faster than FNV for 8-byte or larger chunks when inlined. I've measured my implementation:

test xxhash::rust::chunks_8_xxh64   ... bench:     76874 ns/iter (+/- 603) = 852 MB/s
test xxhash::rust::chunks_15_xxh64  ... bench:     56387 ns/iter (+/- 681) = 1162 MB/s
test xxhash::rust::chunks_32_xxh64  ... bench:     26776 ns/iter (+/- 121) = 2447 MB/s
test xxhash::rust::chunks_128_xxh64 ... bench:     14792 ns/iter (+/- 263) = 4430 MB/s
test xxhash::rust::chunks_256_xxh64 ... bench:     12603 ns/iter (+/- 354) = 5200 MB/s

test xxhash::c::chunks_8_xxh64      ... bench:     62287 ns/iter (+/- 441) = 1052 MB/s
test xxhash::c::chunks_15_xxh64     ... bench:     43615 ns/iter (+/- 459) = 1502 MB/s
test xxhash::c::chunks_64_xxh64     ... bench:     11531 ns/iter (+/- 109) = 5683 MB/s
test xxhash::c::chunks_128_xxh64    ... bench:      9555 ns/iter (+/- 98) = 6858 MB/s
test xxhash::c::chunks_256_xxh64    ... bench:      8516 ns/iter (+/- 57) = 7695 MB/s

What makes you think a hash can't be fast? Xxhash has two parts at its core.

  • the inner loop that consumes 32 bytes at a time
  • the finalizer

For 8 byte chunks, it does only the following:

let total_len = 8;
let mut h64 = self.seed + PRIME5 + total_len;

let mut k1: u64 = source * PRIME2;
k1 = rotl64(k1, 31);
k1 *= PRIME1;
h64 ^= k1;
h64 = rotl64(h64, 27) * PRIME1 + PRIME4;

h64 ^= h64 >> 33;
h64 *= PRIME2;
h64 ^= h64 >> 29;
h64 *= PRIME3;
h64 ^= h64 >> 32;
return h64;

However, the finalizer alone is responsible for avalanche properties and can be used with chunks that have exactly 8 bytes:

let mut hash = source;
hash ^= hash >> 33;
hash *= PRIME2;
hash ^= hash >> 29;
hash *= PRIME3;
hash ^= hash >> 32;
return hash;


impl XXHasher {
pub fn new() -> XXHasher { #![inline]
XXHasher::new_with_seed(18446744073709551557u64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this seed come from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a large prime. This should be #[cfg(test)] I think, but I wanted to be consistent with SipHash which has new_with_keys(0, 0) here.

For real uses, the seeds should be randomized, which is done with RandomSipHasher in libstd.

@ghost
Copy link
Author

ghost commented Oct 1, 2014

@pczarn: It's not the hash that's slow, but Rust's handling of them. Hash is inefficient because it needs to be stable (i.e. a.hash() == a.hash(), regardless of where it was computed), and that forces slow paths way too often. I've opened this topic.

@arthurprs
Copy link
Contributor

I expressed this concern a couple of times already. I think we're taking security too far in the hashmap implementations. The majority of programming languages are using a per process seed (either for sip hash or others). Rust has a seed per hash map AND a slow overly-secure hash as the default.

@vks
Copy link
Contributor

vks commented Oct 3, 2014

a slow overly-secure hash as the default

This is a better default than an insecure and fast hash. If you care about performance and not about security, you can opt in. If you don't care about performance, you should use the secure version.

@arthurprs
Copy link
Contributor

Did you guys see http://discuss.rust-lang.org/t/unstable-hash-architecture/578 ? It's something worth discussing.

@alexcrichton
Copy link
Member

Unfortunately deleting the source repository causes bors to get stuck, @Jurily would you mind reopening? Thanks!

lnicola pushed a commit to lnicola/rust that referenced this pull request Jul 28, 2024
fix: Allow flyimport to import primitive shadowing modules

Fixes rust-lang/rust-analyzer#16371
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Aug 1, 2024
fix: Allow flyimport to import primitive shadowing modules

Fixes rust-lang/rust-analyzer#16371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants