Description
The "inbounds" semantics of offset
are notoriously tricky and confusing. From what I hear from @nikic, the "inbounds" part of them is also not nearly as useful as one might think, and the main payoff is being sure that the pointer is not wrapped around either end of the address space.
So... is there a chance that we could significantly simplify the language at acceptable cost for analyses by changing the rules of offset
(and all other "inbounds" offsets that the language does implicitly, like when applying place projections) such that the only case of UB here is overflow wrapping around the address space (both below 0
and above usize::MAX
)? I think that would be great, but of course we have to be careful not to give up too much information here. (That said, we do have a ton of information of the form "this pointer is dereferenceable for size N", which conveys bounds information much more directly than getelementptr inbounds
.)
However, we'd probably need LLVM support for this, adding some sort of getelementptr nowrap
. (There is the possible alternative of using plain getelementptr
, and upgrading that to inbounds
whenever we can derive from other information that the pointer is indeed dereferenceable for a sufficiently large memory range. I am not sure how tricky that would be to implement though.)
So I wonder, @nikic, do you think that would be a reasonable and realistic option? And everyone, do you think that would be a reasonable semantics to shoot for?
In particular, this would resolve #299.
Activity
Lokathor commentedon Jul 11, 2022
Just saying "you can't wrap the address space" is extremely teachable.
scottmcm commentedon Jul 11, 2022
I always thought this was as much about aliasing information as anything. Is
GEPi
not actually important for that because we get all the information we need about it from provenance anyway?Lokathor commentedon Jul 11, 2022
Yeah two pointers that each point to a separate stack object still can't index oob and access each other because they'd have separate provenance.
eddyb commentedon Jul 17, 2022
If GEP was replaced by an untyped "pointer offset with integer dot product" operation (i.e. a sequence of constant strides and runtime indices, to cover the "nested arrays" case, if they do want to keep it as one instruction), it would be great to reuse
nuw
/nsw
flags from arithmetic operations.(NUW/NSW standing for "No Unsigned/Signed Wrap", i.e. opting into to unsigned/signed overflow being UB)
Alternatively, if LLVM had
add nuw
/sub nuw
betweenptr
and an integer, that could allow foradd
/sub
methods on Rust pointers that aren't limited byisize
likeoffset
is (though I'm not sure we want to even dream about such things, given how much legacy there is around the whole "ptrdiff_t
is one bit too small" issue).RalfJung commentedon Feb 10, 2023
Even on the LLVM side there seems to be some desire for a
getelementptr nowrap
, though it seems like that proposal has not moved in a while.offset
(and potentially place projections) rust-lang/opsem-team#10RalfJung commentedon Jun 14, 2023
I learned in the mean time that
getelementptr inbounds
in LLVM doesn't actually require the allocation to be live. In my interpretation of Rust's rules (and in Miri's implementation), we do require liveness. So we could weaken ouroffset
rules a bit by dropping the liveness requirement. But honestly this doesn't solve the main pain points here so I don't think it's worth it.[-]Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?[/-][+]Can we weaken the requirements for `offset`? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?)[/+]RalfJung commentedon Jun 28, 2023
We had a t-opsem meeting on this subject; see rust-lang/opsem-team#10 for a summary: