Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions src/interned.rs
Original file line number Diff line number Diff line change
Expand Up @@ -282,11 +282,12 @@ where
let table = zalsa.table();

// Record the durability of the current query on the interned value.
let durability = zalsa_local
let (durability, last_interned_at) = zalsa_local
.active_query()
.map(|(_, stamp)| stamp.durability)
.map(|(_, stamp)| (stamp.durability, current_revision))
// If there is no active query this durability does not actually matter.
.unwrap_or(Durability::MAX);
// `last_interned_at` needs to be `Revision::MAX`, see the intern_access_in_different_revision test.
.unwrap_or((Durability::MAX, Revision::max()));

let id = zalsa_local.allocate(table, self.ingredient_index, |id| Value::<C> {
fields: unsafe { self.to_internal_data(assemble(id, key)) },
Expand All @@ -295,7 +296,7 @@ where
durability: AtomicU8::new(durability.as_u8()),
// Record the revision we are interning in.
first_interned_at: current_revision,
last_interned_at: AtomicRevision::from(current_revision),
last_interned_at: AtomicRevision::from(last_interned_at),
});

let value = table.get::<Value<C>>(id);
Expand Down Expand Up @@ -391,7 +392,11 @@ where

// The slot is valid in this revision but we have to sync the value's revision.
let current_revision = zalsa.current_revision();
value.last_interned_at.store(current_revision);
// No `if` to be branchless.
value.last_interned_at.store(std::cmp::max(
current_revision,
value.last_interned_at.load(),
));
Comment on lines +396 to +399
Copy link
Contributor

@MichaReiser MichaReiser Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that was not the idea. The idea was to use AtomicUsize::fetch_max to combine the load and store instructions

Something like

value.last_interned_at.fetch_max(current_revision, Ordering::XXX)

where AtomicRevision::fetch_max internally calls fetch_max

Would you mind making this change in a follow up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestngly enough it seems the fetch_max version is worse? https://godbolt.org/z/9efcq7cnh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It sort of make sense because both operations now are atomic. It'd be interesting to see if arm64 produces more efficient instructions

Copy link
Contributor

@MichaReiser MichaReiser Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When these operations affect more than one bit, they cannot be represented by a single x86-64 instruction. Similarly, the fetch_max and fetch_min operations also have no corresponding x86-64 instruction. For these operations, we need a different strategy than a simple lock prefix.

A later version of ARM64, part of ARMv8.1, also includes new CISC style instructions for common atomic operations. For example, the new ldadd (load and add) instruction is equivalent to an atomic fetch_add operation, without the need for an LL/SC loop. It even includes instructions for operations like fetch_max, which don’t exist on x86-64.

It also includes a cas (compare and swap) instruction corresponding to com⁠pare_​exchange. When this instruction is used, there’s no difference between compare_exchange and compare_exchange_weak, just like on x86-64.

While the LL/SC pattern is quite flexible and nicely fits the general RISC pattern, these new instructions can be more performant, as they can be easier to optimize for with specialized hardware.

https://marabos.nl/atomics/hardware.html

fetch_max should be more efficient on ARM64.

Copy link
Contributor Author

@ChayimFriedman2 ChayimFriedman2 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I said:

fetch_max() won't be any faster; it needs to be an atomic RMW. Even on x86, it compiles to a cmpxchg loop, compared to load+store that compiles to normal instructions.

And ARM is the same in this regard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally RMW operations are expensive compared to regular (non-seqcst) load/stores. On x86 these will compile to regular (same as non-atomic) load/store instructions, while RMWs entail a strong barrier (a pipeline stall). If the branch can avoid performing a store the load may be worth it (as a contended store is much more expensive than a branch/load), but I would stay away from the RMW.


db.salsa_event(&|| {
Event::new(EventKind::DidReinternValue {
Expand Down
10 changes: 10 additions & 0 deletions src/revision.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,34 @@ pub struct Revision {
}

impl Revision {
#[inline]
pub(crate) fn max() -> Self {
Self::from(usize::MAX)
}

#[inline]
pub(crate) fn start() -> Self {
Self::from(START)
}

#[inline]
pub(crate) fn from(g: usize) -> Self {
Self {
generation: NonZeroUsize::new(g).unwrap(),
}
}

#[inline]
pub(crate) fn from_opt(g: usize) -> Option<Self> {
NonZeroUsize::new(g).map(|generation| Self { generation })
}

#[inline]
pub(crate) fn next(self) -> Revision {
Self::from(self.generation.get() + 1)
}

#[inline]
fn as_usize(self) -> usize {
self.generation.get()
}
Expand Down
28 changes: 28 additions & 0 deletions tests/intern_access_in_different_revision.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
use salsa::{Durability, Setter};

#[salsa::interned(no_lifetime)]
struct Interned {
field: u32,
}

#[salsa::input]
struct Input {
field: i32,
}

#[test]
fn the_test() {
let mut db = salsa::DatabaseImpl::default();
let input = Input::builder(-123456)
.field_durability(Durability::HIGH)
.new(&db);
// Create an intern in an early revision.
let interned = Interned::new(&db, 0xDEADBEEF);
// Trigger a new revision.
input
.set_field(&mut db)
.with_durability(Durability::HIGH)
.to(123456);
// Read the interned value
let _ = interned.field(&db);
}