-
Notifications
You must be signed in to change notification settings - Fork 7
Description
All string types compared here are worst case O(n) (comparing two equal strings with nonequal address), best case O(1) (unequal length). The reason for the ArcStr outlier is that it has an extra O(1) case: comparing strings cloned from the same Arc origin. It's good to know what string types have this fast accept (any reference counted heap repr should imo) but the eq
benchmark misrepresents how much of an impact this will have by only measuring this one special case; generally the bulk of string comparisons come from different sources, not between clones of one source.
(The only way to get that kind of difference is interning, so all equal (interned) strings are ptr_eq. Then you can always have O(1) comparison.)
Suggested simple resolution:
- Benchmark comparing two strings which aren't
ptr_eq
instead. Basically,new
from the fixture twice. - Separate to the
eq
benchmark, note which string types provide aptr_eq
fast accept and which preserveptr_eq
when cloned.
There's no good general comparison benchmark to use; that requires getting a decent estimate of the distribution of string comparisons to base the benchmark on. A tempting proxy is two random-printable-ASCII strings of length N. A reasonable one for cargo might be to use two random package names chosen based on download frequency.
criterion doesn't particularly like running a benchmark like that — deterministic input to the iterated benchmark routine is preferred — but I believe it can be convinced to do so with some prodding, e.g. I think the proper shape might be:
random ASCII
group.throughput(Throughput::bytes(len as u64));
let rng = FastRng::from_entropy();
// for each string type
group.bench_with_input(BenchmarkId::new("ArcStr", len), &len, {
let mut rng = rng.clone();
let mut bytes = iter::repeat_with(|| rng.gen_range(' '..='~'));
|b, &len| b.iter_batched_ref(
|| {
let mut strings = iter::repeat_with(|| bytes.by_ref().take(len))
.map(String::from_iter)
.map(ArcStr::from);
(strings.next().unwrap(), strings.next().unwrap())
},
|(lhs, rhs)| lhs == rhs,
BatchSize::SmallInput, // maybe; see docs
),
});
random crate names
let rng = FastRng::from_entropy();
let dist = WeightedAliasIndex::new(CRATE_DOWNLOADS);
// for each string type
group.bench_function("ArcStr", {
let mut rng = rng.clone();
let mut names = iter::repeat_with(|| CRATE_NAMES[dist.sample(&mut rng)])
.map(ArcStr::from);
|b| b.iter_batched_ref(
|| (names.next().unwrap(), names.next().unwrap()),
|(lhs, rhs)| lhs == rhs,
BatchSize::SmallInput, // maybe; see docs
),
});