Skip to content

Conversation

ChayimFriedman2
Copy link
Contributor

@ChayimFriedman2 ChayimFriedman2 commented Apr 20, 2025

There is an assert that last_interned_at >= last_changed_revision, and it can fail without this, see the added test.

CC @ibraheemdev, you introduced this assert in #602.

Copy link

netlify bot commented Apr 20, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 97a04e2
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/6807456c55e98100083c35fd

Copy link

codspeed-hq bot commented Apr 20, 2025

CodSpeed Performance Report

Merging #804 will degrade performances by 5.79%

Comparing ChayimFriedman2:outside-intern (97a04e2) with master (4ab2d27)

Summary

❌ 1 (👁 1) regressions
✅ 11 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
👁 amortized[SupertypeInput] 3.8 µs 4 µs -5.79%

@ChayimFriedman2
Copy link
Contributor Author

Nonsense benchmark.

@ibraheemdev
Copy link
Member

I think we discussed this before, but couldn't come up with a test case that would make it fail due to the db lifetime. The idea was to set last_interned_at to Revision::MAX for structs that are interned outside of revisions. Would that work for the case you are describing?

@ChayimFriedman2
Copy link
Contributor Author

I think we discussed this before, but couldn't come up with a test case that would make it fail due to the db lifetime

Well rust-analyzer needs interned(no_lifetime) at least currently.

The idea was to set last_interned_at to Revision::MAX for structs that are interned outside of revisions. Would that work for the case you are describing?

I think yes. I thought about that, but removing the assert seemed easier than starting to mess with the query stack in the struct interning. Is there a specific reason you want to keep this assert?

@MichaReiser
Copy link
Contributor

I'd prefer to set the revision to Revision::MAX (the asserted condition is probably going to be important when implementing garbage collection of interned values) and we already look at the active query on line 223, so it shouldn't increase complexity by a lot.

@ChayimFriedman2 ChayimFriedman2 force-pushed the outside-intern branch 2 times, most recently from 6629b7c to d97ed2c Compare April 21, 2025 17:45
@ChayimFriedman2
Copy link
Contributor Author

@ibraheemdev I edited per your suggestion.

@ChayimFriedman2 ChayimFriedman2 changed the title fix: Remove the assert for interned structs that last_interned_at >= last_changed_revision Make interned's last_interned_at equal Revision::MAX if they are interned outside a quer Apr 21, 2025
src/interned.rs Outdated
Comment on lines 395 to 400
if value.last_interned_at.load() < current_revision {
value.last_interned_at.store(current_revision);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use fetch_max here to store the maximum between the current revision and the last_interned_at to avoid two separate atomic operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AtomicRevision, not AtomicUsize. I will need to define fetch_max(), and it's not worth it. It's not like a race condition is problematic here.

Copy link
Contributor

@MichaReiser MichaReiser Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AtomicRevision is just a small wrapper around AtomicUsize. You can see OptionalAtomicRevision how we exposed other atomic methods.

This isn't just about races, it's also about avoiding unnecessary atomic operations in a very hot method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_max() won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg loop, compared to load+store that compiles to normal instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we can be faster via branchless; I will change to that.

@ChayimFriedman2
Copy link
Contributor Author

@MichaReiser Addressed comments.

…interned outside a query

There is an assert that `last_interned_at >= last_changed_revision`, and it can fail without this, see the added test.
@Veykril Veykril added this pull request to the merge queue Apr 22, 2025
Merged via the queue into salsa-rs:master with commit cf9efae Apr 22, 2025
11 checks passed
@github-actions github-actions bot mentioned this pull request Apr 22, 2025
@ChayimFriedman2 ChayimFriedman2 deleted the outside-intern branch April 22, 2025 10:52
Comment on lines +396 to +399
value.last_interned_at.store(std::cmp::max(
current_revision,
value.last_interned_at.load(),
));
Copy link
Contributor

@MichaReiser MichaReiser Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that was not the idea. The idea was to use AtomicUsize::fetch_max to combine the load and store instructions

Something like

value.last_interned_at.fetch_max(current_revision, Ordering::XXX)

where AtomicRevision::fetch_max internally calls fetch_max

Would you mind making this change in a follow up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestngly enough it seems the fetch_max version is worse? https://godbolt.org/z/9efcq7cnh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It sort of make sense because both operations now are atomic. It'd be interesting to see if arm64 produces more efficient instructions

Copy link
Contributor

@MichaReiser MichaReiser Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When these operations affect more than one bit, they cannot be represented by a single x86-64 instruction. Similarly, the fetch_max and fetch_min operations also have no corresponding x86-64 instruction. For these operations, we need a different strategy than a simple lock prefix.

A later version of ARM64, part of ARMv8.1, also includes new CISC style instructions for common atomic operations. For example, the new ldadd (load and add) instruction is equivalent to an atomic fetch_add operation, without the need for an LL/SC loop. It even includes instructions for operations like fetch_max, which don’t exist on x86-64.

It also includes a cas (compare and swap) instruction corresponding to com⁠pare_​exchange. When this instruction is used, there’s no difference between compare_exchange and compare_exchange_weak, just like on x86-64.

While the LL/SC pattern is quite flexible and nicely fits the general RISC pattern, these new instructions can be more performant, as they can be easier to optimize for with specialized hardware.

https://marabos.nl/atomics/hardware.html

fetch_max should be more efficient on ARM64.

Copy link
Contributor Author

@ChayimFriedman2 ChayimFriedman2 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I said:

fetch_max() won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg loop, compared to load+store that compiles to normal instructions.

And ARM is the same in this regard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally RMW operations are expensive compared to regular (non-seqcst) load/stores. On x86 these will compile to regular (same as non-atomic) load/store instructions, while RMWs entail a strong barrier (a pipeline stall). If the branch can avoid performing a store the load may be worth it (as a contended store is much more expensive than a branch/load), but I would stay away from the RMW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants