Skip to content

Avoid rehashing Fingerprint as a map key #76233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion compiler/rustc_data_structures/src/fingerprint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ use rustc_serialize::{
opaque::{self, EncodeResult},
Decodable, Encodable,
};
use std::hash::{Hash, Hasher};
use std::mem;

#[derive(Eq, PartialEq, Ord, PartialOrd, Hash, Debug, Clone, Copy)]
#[derive(Eq, PartialEq, Ord, PartialOrd, Debug, Clone, Copy)]
pub struct Fingerprint(u64, u64);

impl Fingerprint {
Expand Down Expand Up @@ -76,6 +77,33 @@ impl ::std::fmt::Display for Fingerprint {
}
}

impl Hash for Fingerprint {
#[inline]
fn hash<H: Hasher>(&self, state: &mut H) {
state.write_fingerprint(self);
}
}

trait FingerprintHasher {
fn write_fingerprint(&mut self, fingerprint: &Fingerprint);
}

impl<H: Hasher> FingerprintHasher for H {
#[inline]
default fn write_fingerprint(&mut self, fingerprint: &Fingerprint) {
self.write_u64(fingerprint.0);
self.write_u64(fingerprint.1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try a panic! here instead to try and track down all the cases where we are hashing fingerprints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't, but I tried just now and it ran into a StableHasher for the parent DefPathHash:

fn compute_stable_hash(&self, parent_hash: DefPathHash) -> DefPathHash {
let mut hasher = StableHasher::new();
// We hash a `0u8` here to disambiguate between regular `DefPath` hashes,
// and the special "root_parent" below.
0u8.hash(&mut hasher);
parent_hash.hash(&mut hasher);

I suppose we could have that hash without the parent at first, and then Fingerprint::combine them for the final value. I'll give that a shot and see if anything else comes up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next one that comes up is item_ids_hash.hash_stable(...):

// Combining the `DefPathHash`s directly is faster than feeding them
// into the hasher. Because we use a commutative combine, we also don't
// have to sort the array.
let item_ids_hash = item_ids
.iter()
.map(|id| {
let (def_path_hash, local_id) = id.id.to_stable_hash_key(hcx);
debug_assert_eq!(local_id, hir::ItemLocalId::from_u32(0));
def_path_hash.0
})
.fold(Fingerprint::ZERO, |a, b| a.combine_commutative(b));
item_ids.len().hash_stable(hcx, hasher);
item_ids_hash.hash_stable(hcx, hasher);

I wonder if we could instead add a combine operation directly in the StableHasher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could specialize to allow Fingerprints to be used with StableHasher too, and unimplement the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, supporting combine directly with StableHasher seems like a good idea -- we presumably want that (or similar) a lot.

I'm not sure if combine is "as good" as hashing though, from a "hash quality" perspective. I would sort of assume no because then we wouldn't get any wins from using it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another hit:

node_to_node_index: Sharded<FxHashMap<DepNode<K>, DepNodeIndex>>,

... where DepNode<K> is a K and a Fingerprint.

I'm inclined to let all of these hash normally for now. Unhasher is already a bit hacky itself...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted though, DepNode<K> could ignore its K for hashing purposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does sound like there's sort of a lot of potential here but the current Hash/Hasher API doesn't readily allow specializing like we want to.

I was looking at the Hasher API, and it feels like it might be worth adding something like a write_hash or similar, where the Hasher can expect the incoming value to already be "hashed" or at least nicely distributed across the range. To start we could just take a u64 since that's what Hasher::finish() returns, though e.g. for Fingerprint we really want u128. Maybe fn write_hash(impl Into<u128>) makes sense, not sure.

I am leaning towards saying that we should just merge this PR as-is: it seems like a clear, if small, win, and while there may be more hidden through careful hash-skipping it's probably better to evaluate each in a standalone manner, particularly given the relative complexity of the Unhasher design. We can consider other improvements, like the one I suggested in the previous paragraph, later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, supporting combine directly with StableHasher seems like a good idea -- we presumably want that (or similar) a lot.

FWIW, I was just wanting something like this for span's hash implementation, which currently re-hashes a hash of the file name.

}
}

impl FingerprintHasher for crate::unhash::Unhasher {
#[inline]
fn write_fingerprint(&mut self, fingerprint: &Fingerprint) {
// `Unhasher` only wants a single `u64`
self.write_u64(fingerprint.0);
}
}

impl stable_hasher::StableHasherResult for Fingerprint {
#[inline]
fn finish(hasher: stable_hasher::StableHasher) -> Self {
Expand Down
1 change: 1 addition & 0 deletions compiler/rustc_data_structures/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ pub use atomic_ref::AtomicRef;
pub mod frozen;
pub mod tagged_ptr;
pub mod temp_dir;
pub mod unhash;

pub struct OnDrop<F: Fn()>(pub F);

Expand Down
29 changes: 29 additions & 0 deletions compiler/rustc_data_structures/src/unhash.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
use std::collections::{HashMap, HashSet};
use std::hash::{BuildHasherDefault, Hasher};

pub type UnhashMap<K, V> = HashMap<K, V, BuildHasherDefault<Unhasher>>;
pub type UnhashSet<V> = HashSet<V, BuildHasherDefault<Unhasher>>;

/// This no-op hasher expects only a single `write_u64` call. It's intended for
/// map keys that already have hash-like quality, like `Fingerprint`.
#[derive(Default)]
pub struct Unhasher {
value: u64,
}

impl Hasher for Unhasher {
#[inline]
fn finish(&self) -> u64 {
self.value
}

fn write(&mut self, _bytes: &[u8]) {
unimplemented!("use write_u64");
}

#[inline]
fn write_u64(&mut self, value: u64) {
debug_assert_eq!(0, self.value, "Unhasher doesn't mix values!");
self.value = value;
}
}
5 changes: 3 additions & 2 deletions compiler/rustc_middle/src/ty/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ use rustc_data_structures::stable_hasher::{
hash_stable_hashmap, HashStable, StableHasher, StableVec,
};
use rustc_data_structures::sync::{self, Lock, Lrc, WorkerLocal};
use rustc_data_structures::unhash::UnhashMap;
use rustc_errors::ErrorReported;
use rustc_hir as hir;
use rustc_hir::def::{DefKind, Res};
Expand Down Expand Up @@ -935,7 +936,7 @@ pub struct GlobalCtxt<'tcx> {

/// A map from `DefPathHash` -> `DefId`. Includes `DefId`s from the local crate
/// as well as all upstream crates. Only populated in incremental mode.
pub def_path_hash_to_def_id: Option<FxHashMap<DefPathHash, DefId>>,
pub def_path_hash_to_def_id: Option<UnhashMap<DefPathHash, DefId>>,

pub queries: query::Queries<'tcx>,

Expand Down Expand Up @@ -1104,7 +1105,7 @@ impl<'tcx> TyCtxt<'tcx> {
let def_path_hash_to_def_id = if s.opts.build_dep_graph() {
let capacity = definitions.def_path_table().num_def_ids()
+ crates.iter().map(|cnum| cstore.num_def_ids(*cnum)).sum::<usize>();
let mut map = FxHashMap::with_capacity_and_hasher(capacity, Default::default());
let mut map = UnhashMap::with_capacity_and_hasher(capacity, Default::default());

map.extend(definitions.def_path_table().all_def_path_hashes_and_def_ids(LOCAL_CRATE));
for cnum in &crates {
Expand Down