-
Notifications
You must be signed in to change notification settings - Fork 186
Make interned's last_interned_at equal Revision::MAX if they are interned outside a quer #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #804 will degrade performances by 5.79%Comparing Summary
Benchmarks breakdown
|
fea2e4b
to
fafa4ca
Compare
Nonsense benchmark. |
I think we discussed this before, but couldn't come up with a test case that would make it fail due to the db lifetime. The idea was to set |
Well rust-analyzer needs
I think yes. I thought about that, but removing the assert seemed easier than starting to mess with the query stack in the struct interning. Is there a specific reason you want to keep this assert? |
I'd prefer to set the revision to |
6629b7c
to
d97ed2c
Compare
@ibraheemdev I edited per your suggestion. |
last_interned_at >= last_changed_revision
d97ed2c
to
88c7b9d
Compare
src/interned.rs
Outdated
if value.last_interned_at.load() < current_revision { | ||
value.last_interned_at.store(current_revision); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use fetch_max
here to store the maximum between the current revision and the last_interned_at
to avoid two separate atomic operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is AtomicRevision
, not AtomicUsize
. I will need to define fetch_max()
, and it's not worth it. It's not like a race condition is problematic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AtomicRevision
is just a small wrapper around AtomicUsize
. You can see OptionalAtomicRevision
how we exposed other atomic methods.
This isn't just about races, it's also about avoiding unnecessary atomic operations in a very hot method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch_max()
won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg
loop, compared to load+store that compiles to normal instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we can be faster via branchless; I will change to that.
88c7b9d
to
22f0dc2
Compare
@MichaReiser Addressed comments. |
…interned outside a query There is an assert that `last_interned_at >= last_changed_revision`, and it can fail without this, see the added test.
22f0dc2
to
97a04e2
Compare
value.last_interned_at.store(std::cmp::max( | ||
current_revision, | ||
value.last_interned_at.load(), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that was not the idea. The idea was to use AtomicUsize::fetch_max
to combine the load
and store
instructions
Something like
value.last_interned_at.fetch_max(current_revision, Ordering::XXX)
where AtomicRevision::fetch_max
internally calls fetch_max
Would you mind making this change in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestngly enough it seems the fetch_max
version is worse? https://godbolt.org/z/9efcq7cnh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. It sort of make sense because both operations now are atomic. It'd be interesting to see if arm64 produces more efficient instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When these operations affect more than one bit, they cannot be represented by a single x86-64 instruction. Similarly, the fetch_max and fetch_min operations also have no corresponding x86-64 instruction. For these operations, we need a different strategy than a simple lock prefix.
A later version of ARM64, part of ARMv8.1, also includes new CISC style instructions for common atomic operations. For example, the new ldadd (load and add) instruction is equivalent to an atomic fetch_add operation, without the need for an LL/SC loop. It even includes instructions for operations like fetch_max, which don’t exist on x86-64.
It also includes a cas (compare and swap) instruction corresponding to compare_exchange. When this instruction is used, there’s no difference between compare_exchange and compare_exchange_weak, just like on x86-64.
While the LL/SC pattern is quite flexible and nicely fits the general RISC pattern, these new instructions can be more performant, as they can be easier to optimize for with specialized hardware.
https://marabos.nl/atomics/hardware.html
fetch_max
should be more efficient on ARM64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what I said:
fetch_max() won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg loop, compared to load+store that compiles to normal instructions.
And ARM is the same in this regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally RMW operations are expensive compared to regular (non-seqcst) load/stores. On x86 these will compile to regular (same as non-atomic) load/store instructions, while RMWs entail a strong barrier (a pipeline stall). If the branch can avoid performing a store the load may be worth it (as a contended store is much more expensive than a branch/load), but I would stay away from the RMW.
There is an assert that
last_interned_at >= last_changed_revision
, and it can fail without this, see the added test.CC @ibraheemdev, you introduced this assert in #602.