-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Document the current MIR semantics that are clear from existing code #95320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've left a bunch of FIXME's in here. I may try and get to some of them before merging, but they should all be non-critical and can just be added later. The test change has a relevant Zulip conversation Also, I didn't add anything for |
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A place stores an address and an optional meta value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably the right thing to have. That being said, I'm going to leave this open until I get a chance to more thoroughly go through all of the documentation and deal with DSTs (or probably someone else should do that because I know very little about it)
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be left undefined in which order theybare evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the ordering question to a "needs clarification" section in an effort to only document things which are already decided.
That being said, I don't think this is a good option. First of all, I'm not sure "order is left indeterminate" implies "any attempt on depending on the order is UB" - it might be the case that we instead get two distinct well-defined possibilities. More importantly though, this doesn't feel like a thing that MIRI can implement, which is concerning.
I also can't produce an example, but I would not be surprised if this is actually required for current Rust, since the evaluation order in Rust is well-defined. I can't get MIR building to produce such MIR, but it would at least imply that the temporaries produced by this are required:
pub fn foo(a: &mut i32) -> i32 {
let derived = &mut *a;
*derived + *a
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intrinsics may be codegened in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think at that point this check should be refined to verify those bodies that do get codegened only. Even better would be if code elsewhere got changed to stop passing intrinsics that don't get codegened through the MIR validator. cc @oli-obk
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Miri should be made to match whatever is decided. It is currently very relaxed.
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think order should be left indetermimate so any attempt on depending on the order is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see the comment above - basically the same concerns apply here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leaving evaluation order indeterminate is a bad idea. We should specify some order eventually, and probably it should be left-to-right. (That's what Miri does.)
The disjointness thing can be enforced without leaving the order indeterminate, using the aliasing model instead. That's better because it can be checked by Miri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must not alias the place given to any Operand::Move passed as argument. All codegen backends depend on this property to allow passing Operand::Move by reference in case of a type not fitting inside registers without having to male a copy on the stack or having to restrict the callee to not mutate the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added in a mention of that in the "needs clarification" section above. Based on the status of the linked issue, it seems that it is still unclear how we are justifying that optimization
cc @RalfJung - want to check that everything written here (of specific concern is the place and operand stuff) is at least compatible with your model of MIR |
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think "place to value conversion" is a terrible term, though I know it is standard in C. This operation accesses memory! It shouldn't be described as if it was a simple cast.
So I'd say something more along the lines of: it evaluates the LHS and RHS to places, loads a value from the place denoted by the RHS, and then stores that value into the place denoted by the LHS.
But also, this doesn't even do place-to-value conversion? The RHS is evaluated to a value, and that is stored to the place denoted by the LHS.
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CFG paths are relevant. What's relevant is the actual sequence of statements that are executed when the function runs. There's basically a boolean flag dead-or-alive for each local that starts out 'dead' for locals with annotations, and that is set appropriately by these annotations. Accessing (evaluating the place expression) of a local is UB if it is currently 'dead'.
Actually it's more than that, each StorageLive allows the local to be put at a different location in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CFG paths are relevant
Well, because of the MaybeStorageLiveLocals
in the validator, this is already the case. Certainly CFG paths should not be relevant in the spec/miri, but the docs here have to also mention what is required by the validator, which might have a stronger opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the storage check currently present in the validator should be considered to be merely a debugging aid. It is possible to write unsafe code that uses local when it does not have storage. With additional optimizations present such code would fail the check. When we get to that point the check should be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a good point. With the current setup, we can't turn *r
into x
even if we know that r = &mut x;
, because that might turn UB into an ICE. I'll mention this in the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, we should probably mention that we often don't want to "remove" UB in MIR optimizations in general, since keeping it visible to the codegen backend is important.
For example, we don't want to lower switchInt [0 => bb3] otherwise: bbUnreachable
to goto bb3
, since that keeps LLVM from knowing about the impossible condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these kinds of things should definitely be documented, but imo not here. I do want to start an organized way of defining "canonicalized MIR" at some point, because I think that will be increasingly important, and this seems like the kind of thing to include there
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this operation can cause UB: the crated place must be dereferencable and suitably aligned, as determined by its type.
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what Miri does, and I think it is the right model.
compiler/rustc_middle/src/mir/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah now this is the right place for the rant that I put earlier. ;)
I don't think we should all this a "conversion". That sounds like an as
cast, like just putting something into a different representation. That's not at all what happens though. This is a load from memory, and should be described as that.
@bors r+ |
📌 Commit 8732bf5 has been approved by |
Document the current MIR semantics that are clear from existing code This PR adds documentation to places, operands, rvalues, statementkinds, and terminatorkinds that describes their existing semantics and requirements. In many places the semantics depend on the Rust memory model or other T-Lang decisions - when this is the case, it is just noted as such with links to UCG issues where possible. I'm hopeful that none of the documentation added here can be used to justify optimizations that depend on the memory model. The documentation for places and operands probably comes closest to running afoul of this - if people think that it cannot be merged as is, it can definitely also be taken out. The goal here is to only document parts of MIR that seem to be decided already, or are at least depended on by existing code. That leaves quite a number of open questions - those are marked as "needs clarification." I'm not sure what to do with those in this PR - we obviously can't decide all these questions here. Should I just leave them in as is? Take them out? Keep them in but as `//` instead of `///` comments? If this is too big to review at once, I can split this up. r? rust-lang/mir-opt
…askrgr Rollup of 7 pull requests Successful merges: - rust-lang#95320 (Document the current MIR semantics that are clear from existing code) - rust-lang#95722 (pre-push.sh: Use python3 if python is not found) - rust-lang#95881 (Use `to_string` instead of `format!`) - rust-lang#95909 (rustdoc: Reduce allocations in a `theme` function) - rust-lang#95910 (Fix crate_type attribute to not warn on duplicates) - rust-lang#95920 (use `Span::find_ancestor_inside` to get right span in CastCheck) - rust-lang#95936 (Fix a bad error message for `relative paths are not supported in visibilities` error) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
This PR adds documentation to places, operands, rvalues, statementkinds, and terminatorkinds that describes their existing semantics and requirements. In many places the semantics depend on the Rust memory model or other T-Lang decisions - when this is the case, it is just noted as such with links to UCG issues where possible. I'm hopeful that none of the documentation added here can be used to justify optimizations that depend on the memory model. The documentation for places and operands probably comes closest to running afoul of this - if people think that it cannot be merged as is, it can definitely also be taken out.
The goal here is to only document parts of MIR that seem to be decided already, or are at least depended on by existing code. That leaves quite a number of open questions - those are marked as "needs clarification." I'm not sure what to do with those in this PR - we obviously can't decide all these questions here. Should I just leave them in as is? Take them out? Keep them in but as
//
instead of///
comments?If this is too big to review at once, I can split this up.
r? rust-lang/mir-opt