-
Notifications
You must be signed in to change notification settings - Fork 61
Closed
Labels
A-validityTopic: Related to validity invariantsTopic: Related to validity invariants
Description
@RalfJung introduced the idea of validity invariants in their blog post "Two kinds of invariants". Presuming we agree with this framing (I do), we need to define these validity invariants.
These invariants need to justify the various sorts of optimizations that we currently do:
- For example,
Option<&T>
layout optimization - Marking pointers as deferenceable
We need to discuss also the role of uninitialized memory and how it fits in. For example, when are invariants required to hold? Only when "compiler thinks memory is initialized" -- can/should we make that more precise? Also, what about loads of uninitialized integral values (a sometimes useful hack) -- are those valid? What is the effect?
Metadata
Metadata
Assignees
Labels
A-validityTopic: Related to validity invariantsTopic: Related to validity invariants
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
RalfJung commentedon Oct 9, 2018
Noting it here so we don't forget about it:
extern type
where we don't know the alignment?RalfJung commentedon Nov 14, 2018
As but two of many examples for people using
&mut
to uninitialized data:(Collecting this here so I will find the links when the discussion starts.)
RalfJung commentedon Nov 17, 2018
An example of uninitialized raw pointers and integers: https://github.com/carllerche/bytes/blob/456221d16521cf54cea0e6569669e47120a1b738/src/bytes.rs#L1810
!
exprs is too light rust-lang/rust#56190RalfJung commentedon Nov 24, 2018
Related discussion about which invariants we require to hold on
&*x
: rust-lang/rust#56190RalfJung commentedon Nov 25, 2018
str
is an interesting case: It usually only exists behind a ptr indirection, so an&str
to non-UTF8 data would not be UB if references do not have to be transitively valid.Maybe the UTF-8 thing is just a safety invariant, never a validity invariant?
gnzlbg commentedon Nov 26, 2018
@RalfJung I think that, ideally, for DSTs, validity should just be specified for custom DSTs. That is, the validity of trait objects, slices, and
str
, should follow from that.We don't have custom DSTs yet, so I think it is ok to also consider validity of trait objects and slices, and in the future, when custom DSTs are available, try to reconcile those into a more general framework.
Currently,
str
is an special DST in Rust that's part of the language, but I think it might be worth it to consider it as a custom DST that's just defined asstruct str([u8])
so that its validity would follow from the validity of slices. This is how I expectAsciiStr
to be implemented as well once we have custom DSTs, and I think it would be weird forstr
to be more special than it. Also,CStr
would be a custom DST as well.Thinking more into the future, if
str
were to be implemented in core as a custom DST (@rkruppe discovered: rust-lang/rust#19612), we would still need to make it a lang item, which would still enable the compiler to do specificstr
optimizations. I don't know if the compiler is already doing some, and it is unclear to me how much value these add. But the requirement that "constructing a&str
to a non-UTF8 string is undefined behavior" appears to me to be a requirement that, while checkable on miri, I don't think is checkable in general. For example, with custom DSTs I could have a slice containing only prime numbers:&PrimeNumberSlice
. If constructing one that points to non-prime numbers is undefined behavior, thenmiri
would have to detect it at construction point "somehow".So I like to think that the UTF8 requirement on
&str
is more like an user-defined pre-condition on some of the&str
methods. For example,str::len(&self)
returns the length of the&str
in bytes, so this method does not necessarily have to require the contents ofstr
to be valid UTF-8 to work "correctly". Creating astr
that does not point to an UTF8 string, and then calling some methods whose pre-conditions are violated, could lead to undefined behavior down the road (e.g. a null pointer dereference, read out-of-bounds, etc.) but I don't think it should be a new special type of undefined behavior per se.RalfJung commentedon Nov 27, 2018
Sure, custom DSTs come with custom invariants that rustc and miri do not know. These are safety invariants, not validity invariants.
What you are saying is that we could make UTF-8 a mere safety invariant for
str
. I agree. The open question is whether that reflects reality, and what it is that we want to do.RalfJung commentedon Dec 3, 2018
ManuallyDrop
is an interesting case. It certainly has to be at least "bitstring-valid" (using @asajeffrey's term), does it have to be anything more? Maybe we should, after all, separate the things references say about their pointee from the rest of validity?gnzlbg commentedon Dec 3, 2018
RalfJung commentedon Dec 3, 2018
We want (and have)
ManuallyDrop<T: ?Sized>
, while we are unlikely to get unsized unions any time soon.RalfJung commentedon Dec 3, 2018
Basically,
ManuallyDrop
must still be valid because we must still be able to determine its size.gnzlbg commentedon Dec 3, 2018
JakobDegen commentedon May 23, 2023
Closing. There are lots of open questions here, almost all of which are in other issues. This one doesn't seem useful