Skip to content

eq benchmark potentially misleading #30

@CAD97

Description

@CAD97

All string types compared here are worst case O(n) (comparing two equal strings with nonequal address), best case O(1) (unequal length). The reason for the ArcStr outlier is that it has an extra O(1) case: comparing strings cloned from the same Arc origin. It's good to know what string types have this fast accept (any reference counted heap repr should imo) but the eq benchmark misrepresents how much of an impact this will have by only measuring this one special case; generally the bulk of string comparisons come from different sources, not between clones of one source.

(The only way to get that kind of difference is interning, so all equal (interned) strings are ptr_eq. Then you can always have O(1) comparison.)

Suggested simple resolution:

  • Benchmark comparing two strings which aren't ptr_eq instead. Basically, new from the fixture twice.
  • Separate to the eq benchmark, note which string types provide a ptr_eq fast accept and which preserve ptr_eq when cloned.

There's no good general comparison benchmark to use; that requires getting a decent estimate of the distribution of string comparisons to base the benchmark on. A tempting proxy is two random-printable-ASCII strings of length N. A reasonable one for cargo might be to use two random package names chosen based on download frequency.

criterion doesn't particularly like running a benchmark like that — deterministic input to the iterated benchmark routine is preferred — but I believe it can be convinced to do so with some prodding, e.g. I think the proper shape might be:

random ASCII
group.throughput(Throughput::bytes(len as u64));
let rng = FastRng::from_entropy();

// for each string type
group.bench_with_input(BenchmarkId::new("ArcStr", len), &len, {
    let mut rng = rng.clone();
    let mut bytes = iter::repeat_with(|| rng.gen_range(' '..='~'));
    |b, &len| b.iter_batched_ref(
        || {
            let mut strings = iter::repeat_with(|| bytes.by_ref().take(len))
                .map(String::from_iter)
                .map(ArcStr::from);
            (strings.next().unwrap(), strings.next().unwrap())
        },
        |(lhs, rhs)| lhs == rhs,
        BatchSize::SmallInput, // maybe; see docs
    ),
});
random crate names
let rng = FastRng::from_entropy();
let dist = WeightedAliasIndex::new(CRATE_DOWNLOADS);

// for each string type
group.bench_function("ArcStr", {
    let mut rng = rng.clone();
    let mut names = iter::repeat_with(|| CRATE_NAMES[dist.sample(&mut rng)])
        .map(ArcStr::from);
    |b| b.iter_batched_ref(
        || (names.next().unwrap(), names.next().unwrap()),
        |(lhs, rhs)| lhs == rhs,
        BatchSize::SmallInput, // maybe; see docs
    ),
});

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions