eq benchmark potentially misleading

All string types compared here are worst case O(n) (comparing two equal strings with nonequal address), best case O(1) (unequal length). The reason for the ArcStr outlier is that it has an extra O(1) case: comparing strings cloned from the same Arc origin. It's good to know what string types have this fast accept (any reference counted heap repr *should* imo) but the `eq` benchmark misrepresents how much of an impact this will have by only measuring this one special case; generally the bulk of string comparisons come from different sources, not between clones of one source.

(The only way to get that kind of difference is interning, so all equal (interned) strings are ptr_eq. Then you can always have O(1) comparison.)

Suggested simple resolution:
- Benchmark comparing two strings which *aren't* `ptr_eq` instead. Basically, `new` from the fixture twice.
- Separate to the `eq` benchmark, note which string types provide a `ptr_eq` fast accept and which preserve `ptr_eq` when cloned.

There's no good general comparison benchmark to use; that requires getting a decent estimate of the distribution of string comparisons to base the benchmark on. A tempting proxy is two random-printable-ASCII strings of length N. A reasonable one for cargo might be to use two random package names chosen based on download frequency.

criterion doesn't particularly like running a benchmark like that — deterministic input to the iterated benchmark routine is preferred — but I believe it can be convinced to do so with some prodding, e.g. I think the proper shape might be:

<details><summary>random ASCII</summary>

```rust
group.throughput(Throughput::bytes(len as u64));
let rng = FastRng::from_entropy();

// for each string type
group.bench_with_input(BenchmarkId::new("ArcStr", len), &len, {
    let mut rng = rng.clone();
    let mut bytes = iter::repeat_with(|| rng.gen_range(' '..='~'));
    |b, &len| b.iter_batched_ref(
        || {
            let mut strings = iter::repeat_with(|| bytes.by_ref().take(len))
                .map(String::from_iter)
                .map(ArcStr::from);
            (strings.next().unwrap(), strings.next().unwrap())
        },
        |(lhs, rhs)| lhs == rhs,
        BatchSize::SmallInput, // maybe; see docs
    ),
});
```

</details>

<details><summary>random crate names</summary>

```rust
let rng = FastRng::from_entropy();
let dist = WeightedAliasIndex::new(CRATE_DOWNLOADS);

// for each string type
group.bench_function("ArcStr", {
    let mut rng = rng.clone();
    let mut names = iter::repeat_with(|| CRATE_NAMES[dist.sample(&mut rng)])
        .map(ArcStr::from);
    |b| b.iter_batched_ref(
        || (names.next().unwrap(), names.next().unwrap()),
        |(lhs, rhs)| lhs == rhs,
        BatchSize::SmallInput, // maybe; see docs
    ),
});
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eq benchmark potentially misleading #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

eq benchmark potentially misleading #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions