Skip to content

EndRegion and StorageDead are emitted in the wrong order #43481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RalfJung opened this issue Jul 25, 2017 · 7 comments
Closed

EndRegion and StorageDead are emitted in the wrong order #43481

RalfJung opened this issue Jul 25, 2017 · 7 comments
Assignees
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

This code

fn slice_index() -> u8 {
    let arr: &[_] = &[101, 102, 103, 104, 105, 106];
    arr[5]
}

results in the following MIR (before region erasure):

_3 = &ReScope(Remainder(BlockRemainder { block: NodeId(92), first_statement_index: 0 })) _4;
...
StorageDead(_4);
EndRegion(ReScope(Remainder(BlockRemainder { block: NodeId(92), first_statement_index: 0 })));

So, _4 is borrowed for this scope, but actually it is marked dead before the scope ends. That seems wrong to me, shouldn't _4 have to be live at least until the region ends?

Cc @pnkfelix @nikomatsakis

@Mark-Simulacrum Mark-Simulacrum added A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 26, 2017
@pnkfelix pnkfelix self-assigned this Aug 22, 2017
@pnkfelix
Copy link
Member

I think I have a fix for this. (It was yet another place where I got the order wrong for emitting the EndRegion vs the scope drop code)

@pnkfelix
Copy link
Member

Hmm no maybe I spoke too soon.

The reason I'm hestiating is that my "fix" yields code like this:

        _3 = &'7_0rce _4;                // scope 0 at issue-43481.rs:2:21: 2:52
        _2 = &'7_0rce (*_3);             // scope 0 at issue-43481.rs:2:21: 2:52
        _1 = _2 as &'7_0rce [u8] (Unsize); // scope 0 at issue-43481.rs:2:21: 2:52
           ...
        EndRegion('7_0rce);              // scope 0 at issue-43481.rs:1:24: 4:2
        StorageDead(_1);                 // scope 0 at issue-43481.rs:4:2: 4:2
        StorageDead(_4);                 // scope 0 at issue-43481.rs:4:2: 4:2

We probably want the EndRegion to happen after the StorageDead(_1) but before the StorageDead(_4), right?

@pnkfelix
Copy link
Member

pnkfelix commented Aug 23, 2017

After looking into the code a bit, it seems like trying to change things so that we emit (hypothetical):

        StorageDead(_1);                 // scope 0 at issue-43481.rs:4:2: 4:2
        EndRegion('7_0rce);              // scope 0 at issue-43481.rs:1:24: 4:2
        StorageDead(_4);                 // scope 0 at issue-43481.rs:4:2: 4:2

is likely to be more trouble than its worth.

My thinking had been that it would be good to emit the above mir sequence in order to work toward a pair of flow graph properties like:

  1. For all 'r, whereever EndRegion('r) occurs, there is no more live storage holding references of type &'r _).
  2. For all 'rgn, and all lval such that the code has borrow(s) of form &'rgn lval, whereever StorageDead(lval) occurs, the region 'rgn has already ended (via EndRegion('rgn)).

But I don't know that we really get a benefit out of such a strong pair of properties.

From this description, it seems like @RalfJung wants to enforce property 2.

But it seems reasonable to loosen property 1 to a weaker flow graph property, such as "1b. For all 'r, after EndRegion('r) occurs, no more dereferences into &'r _ can occur." This would mean the newly emitted code from #43481 (comment) would be considered legal; even though EndRegion('7_0rce) occurs before StorageDead(_1) (where _1: &'7_0rce [u8]), there are no derefs of _1, so no actual accesses of '7_0rce occur after its EndRegion.

        EndRegion('7_0rce);              // scope 0 at issue-43481.rs:1:24: 4:2
        StorageDead(_1);                 // scope 0 at issue-43481.rs:4:2: 4:2
        StorageDead(_4);                 // scope 0 at issue-43481.rs:4:2: 4:2

(However, to play devil's advocate: Maybe one can make an analogous argument against trying to enforce even property 2? I.e., if one is already going to make the invariant be about operations and not the relative locations of EndRegion vs StorageDead for lvalues of particular types, then why not weaken both properties and allow the original code that was written in the description ...)

@RalfJung
Copy link
Member Author

RalfJung commented Aug 23, 2017

We probably want the EndRegion to happen after the StorageDead(_1) but before the StorageDead(_4), right?

I guess that would make sense; for my purposes, a "too late" StorageDead (like the one for _1 here) is not a problem. I suppose for borrowck you know better than me.

From this description, it seems like @RalfJung wants to enforce property 2.

Yes, I think that's what I want. Really what I care about is that at the point of EndRegion, all locals with that region in their (unerased) type are still considered live. I also thought that 6d6280e would imply borrowck needs the same to be true?


to play devil's advocate: Maybe one can make an analogous argument against trying to enforce even property 2? I.e., if one is already going to make the invariant be about operations and not the relative locations of EndRegion vs StorageDead for lvalues of particular types, then why not weaken both properties and allow the original code that was written in the description

I can only answer this in the context of my proposed "Types as Contracts", which is still so much in the air that it may or may not be good guideline. ;) But in this context, EndRegion actually is an operation -- it walks the memory and releases locks that expire now because the region has ended. StorageDead here makes it so that if there are still pointers somewhere that refer to this memory, they can invalid before EndRegion, so trying to release the locks ends up following a dangling pointer.

@pnkfelix
Copy link
Member

After discussion with @nikomatsakis , I think the path forward here is to just go ahead and make the change so that we enforce property 2 but do not attempt to enforce property 1.

In other words, regions that have already ended will be allowed to occur in the types of lvals in a StorageDead(lval), because they do not have a semantic effect of dereferencing such an lval.

Just repeating for clarity: The property that we will strive to enforce, is this:

  • For all 'rgn, and all lval such that the code has borrow(s) of form &'rgn lval, whereever StorageDead(lval) occurs, the region 'rgn has already ended (via EndRegion('rgn)).

@arielb1
Copy link
Contributor

arielb1 commented Aug 27, 2017

Sure. Borrowck also requires property (2), while nothing that I know of requires property (1).

@pnkfelix
Copy link
Member

After further reflection, I realized this morning that in fact the two properties I listed are incompatible!

You cannot have both. Therefore, it is a good thing that we decided to just enforce property 2.

Why can't you have both? Because (aha) of cyclic references. E.g. examples like the following:

use std::cell::Cell;

struct S<'a> {
    r: Cell<Option<&'a S<'a>>>,
}

fn main() {
    let x = S { r: Cell::new(None) };
    x.r.set(Some(&x));
}

Here we have an lval that holds a reference &'rgn lval, and thus the choice is forced of whether the EndRegion('rgn) comes before or after the StorageDead(lval)

pnkfelix added a commit to pnkfelix/rust that referenced this issue Sep 13, 2017
…or a scope.

(The idea is that the StorageDead marks the point where the memory can
be deallocated, and the EndRegion is marking where borrows of that
memory can no longer legally exist.)
bors added a commit that referenced this issue Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants