Capability-based Aliasing Model #28
Description
This is a variant of the ACA model that has some advantages over it and is preferred by @ubsan.
Capabilities
The heart of the model is memory capabilities. In this model, each pointer or reference has a capability-set that describes which borrows it can alias with.
In more detail:
data AccessKind = Read | Write
data BorrowKey = BorrowKey {
bkId :: Unique,
bkLifetime :: Lifetime,
}
data Capability = Capability {
root :: BorrowKey,
memoryRange :: MemoryRange,
accessKinds :: Set AccessKind
capLifetime :: Lifetime,
}
Creation
Each borrow creates a BorrowKey
with the appropriate lifetime.
The capability-set of the newly-created borrow is the union of the capability keyed off the new BorrowKey
and the guarantor's capability set, restricted elementwise to the borrow's lifetime, memory range, and access kind.
Propagation
Capability sets are preserved when values are stored, loaded, or operated on in ways other than reborrows. I could not find a satisfying way of specifying that when InstCombine
-able integer operations and/or partial writes are allowed (and to define what an InstCombine
-able integer operation is), but that does not matter much in practice (e.g. the C spec is handwavy there).
Access Capability Set
The capability set of an access from Rust code is the accessing pointer/reference's capability set, plus a new freshly-created borrow key and capability that lasts only for the duration of the access.
Adding that borrow key has the effect of prohibiting data races.
The capability set of an access from foreign code (including intrinsics) is implementation-defined.
Garbage Collection
Borrow keys and capabilities whose lifetime has expired have no effect and can be ignored.
The aliasing rule
The actual aliasing rule is pretty similar to the ACA rule. UB through an aliasing violation occurs if there are 2 accesses (in no particular order) - the asserting access and the conflicting access, and a capability - the asserted capability - such that:
- The memory locations accessed by the 2 accesses, and the asserted capability's memory range overlap.
- At least one of the 2 accesses is a write.
- The asserted capability is within the asserting access's capability set.
- Both accesses are within the asserted capabilities' borrow key's lifetime, and after these capabilities were created.
- There is no access capability within the conflicting access's capability set
Where an access capability is a capability within the conflicting access's capability set such that
- The conflicting access's lifetime is within that capability's lifetime
- The access kind is within capability's access kinds
- The access capability's memory range contains the conflicting access's memory range.
- The access capability has the same borrow key as the asserted capability
Activity
arielb1 commentedon Sep 15, 2016
cc @nikomatsakis
arielb1 commentedon Sep 15, 2016
That's the basic model. We still need some way of formalizing RVO shields. Writing an undef to function return values at function calls would do.
nikomatsakis commentedon Sep 21, 2016
Thanks, I like this writeup. It seems to be close to something that I was separately toying around with. I had a few questions:
This seems like a very broad statement. I guess this means that e.g. if I were to transmute a
&T
to aBox<T>
, thatBox<T>
would have no more capabilities than the&T
(and hence if its dtor ran, that would be bad), right?Presuming the same is true if I round-tripped a pointer through a usize?
I'm not quite sure how raw pointers interact here. For example, if I write:
what happens with the
x
, is it still usable?arielb1 commentedon Sep 21, 2016
That would totally be bad - same as if you mutated the content through that
&T
pointer.Raw pointers act like references, except they also keep the capability set when you reborrow them. I don't have a satisfying way of handling other types (usizes, control-flow dependencies, etc.), but I don't think that matters much.
By the model, sure.
x
andy
have the same capability-set, so that they can be alternatively accessed.arielb1 commentedon Sep 21, 2016
One problem with this model, shared with the ACA model, is that it does not adequately protect things behind nested references, as this code is perfectly valid:
Because
PTR
andptr
share the same capability-set, the access incollaborator
is perfectly valid. The 2 reads fromdata
create only short-lived capabilites, which do not encompass the call tocollaborator
. This means that the compiler can't merge the reads inquestion
.This does not conflict with LLVM or anything, because
noalias
only applies to direct arguments, but might be annoying sometimes.