Skip to content

continue expression should have the scope of the while body #1403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewrk opened this issue Aug 23, 2018 · 21 comments
Closed

continue expression should have the scope of the while body #1403

andrewrk opened this issue Aug 23, 2018 · 21 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

Here's a use case:

    const symtab = while (ncmd != 0) : ({ncmd -= 1; ptr += lc.cmdsize;}) {
        const lc = @ptrCast(*std.c.load_command, ptr);
        switch (lc.cmd) {
            std.c.LC_SYMTAB => break @ptrCast(*std.c.symtab_command, ptr),
        }
    } else {
        return error.MissingDebugInfo;
    };

This code doesn't work, but it should:

/Users/andy/dev/zig/build/lib/zig/std/debug/index.zig:380:60: error: use of undeclared identifier 'lc'
    const symtab = while (ncmd != 0) : ({ncmd -= 1; ptr += lc.cmdsize;}) {
                                                           ^

This is after fixing a bug where I forgot the ptr += ... part, and ability to put it in the continue expression would have prevented the bug.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 23, 2018
@andrewrk andrewrk added this to the 0.4.0 milestone Aug 23, 2018
kristate added a commit to kristate/zig that referenced this issue Aug 26, 2018
…>scope` and not just the immediate parent;
kristate added a commit to kristate/zig that referenced this issue Aug 26, 2018
@raulgrell
Copy link
Contributor

My only issue is a symbol appearing before it is defined. I mean, it's not too bad since it should be pretty obvious where it's coming from, but it still feels weird...

@BarabasGitHub
Copy link
Contributor

BarabasGitHub commented Aug 26, 2018

I agree, it feels weird and possibly a bit confusing. Especially with larger loops it might be hard to see where that variable is coming from.

Then again I find the whole concept a bit weird, because it's basically a second loop body that appears before the other but gets executed after.

@kristate
Copy link
Contributor

This doesn't seem weird to me at all and that's why I produced a PR to fix it.
Think of the continue expression at the end of the while body and it all clears up.

@thejoshwolfe
Copy link
Contributor

I think this is a bad idea. Consider this code

fn readSections(buffer: []const u8) {
    var index: usize = 0;
    while (index + header_size <= buffer.len) : (index += section_size) {
        const header: *const [header_size]u8 = arrayPointer(buffer, index, header_size);
        index += header_size;
        if (!hasSectionBody(header)) continue;
        const section_size = readInt32(buffer, index);
        // some complicated stuff here
        if (something) {
            somethingElse();
            continue;
        }
        // some other stuff
    }
}

This has to be a compile error, because the first continue comes before the section_size declaration.

A better way to structure this is with a labeled break.

fn readSections(buffer: []const u8) {
    var index: usize = 0;
    while (index + header_size <= buffer.len) {
        const header: *const [header_size]u8 = arrayPointer(buffer, index, header_size);
        index += header_size;
        if (!hasSectionBody(header)) continue;
        const section_size = readInt32(buffer, index);
        continue_: {
            // some complicated stuff here
            if (something) {
                somethingElse();
                break :continue_;
            }
            // some other stuff
        }
        index += section_size;
    }
}

@andrewrk
Copy link
Member Author

@thejoshwolfe makes a strong point, and trying to recover the proposal taking this into account starts to get into Too Complicated territory, so I'm going to reject this proposal.

@andrewrk
Copy link
Member Author

andrewrk commented Aug 27, 2018

Re-opening since @kristate made a working PR. There's no explicit argument addressing my point above, but if I may guess, it would be:

Here, look, it's not so complicated after all. Please look at the following test cases and reconsider.

I'll make it 0.3.0 since we have a working PR.

@andrewrk andrewrk modified the milestones: 0.4.0, 0.3.0 Aug 27, 2018
@andrewrk
Copy link
Member Author

Here are my concerns, why I don't just accept and merge right away:

  • This would be the first and only instance in Zig where you see a variable name and the scope for it is after the name.
  • It's a little bit arbitrary which variables would get to be inside the scope and which wouldn't. For example, this wouldn't work (and probably shouldn't):
var i: usize = 0;
while (true) : (i += x) {
    if (something) {
        const x = 1;
        continue;
    }
}
  • The alternative - status quo - is just not really that bad. Language simplicity is really, really important in Zig. ziglang.org brags "Focus on debugging your application rather than debugging your knowledge of your programming language." and the language takes this very seriously. I don't want a single person to spend 10 minutes trying to figure out how while continue expression variable scopes work, when they could be spending 0 minutes already knowing how they work because they work like every other variable reference in the language.

@ghost
Copy link

ghost commented Aug 27, 2018

If I may add my two cents, I find the block expression harder to read than the while (foo) {}, so I'd actually probably favor this proposal, it would take my some minutes to understand blocks/ usage in each context but maybe I would just need to use them day in day out and it would be natural, I'm not sure.
For writing the code I sure think this proposal would make it easier but in that case zig prefers reading over writing so this can hardly be an argument.

This would be the first and only instance in Zig where you see a variable name and the scope for it is after the name.

we do see function usage before function definition all the time though so maybe this is analogous

@kristate
Copy link
Contributor

@andrewrk I want this feature and I think it is intuitive to the language.

My most simple response to all of this is that you mentioned in your original post:

const symtab = while (ncmd != 0) : ({ncmd -= 1; ptr += lc.cmdsize;}) {
This code doesn't work, but it should:

This is after fixing a bug where I forgot the ptr += ... part, and ability to put it in the continue expression would have prevented the bug.

I agree with this. This should be a thing and it can be!


I don't want a single person to spend 10 minutes trying to figure out how while continue expression variable scopes work,

They already know how they would work -- and if they are wrong, the compiler will tell them. No offense to @thejoshwolfe , they will (and I have) spent in the past 10 minutes trying to understand the kind of break/continue mechanics in his post above.


It's a little bit arbitrary which variables would get to be inside the scope and which wouldn't. For example, this wouldn't work (and probably shouldn't):

It shouldn't and doesn't compile with the patch:

/Users/cfkk/Source/zig2/build/1403.zig:4:22: error: use of undeclared identifier 'x'
while (true) : (i += x) {
                     ^

Cross posing from the PR, but in the event that unreachable code is referenced, we get an error message. @thejoshwolfe did a good job working through this and I thank him for pointing it out.

/Users/cfkk/Source/zig2/build/test.zig:5:9: error: unreachable code
        const b = 42;
        ^
/Users/cfkk/Source/zig2/build/test.zig:3:31: error: cannot reference variable from unreachable code region
    while (i <= 10) : ({ i += b; }) {
                              ^

where the following code was used to test:

test "reference variable from unreachable code region" {
    var i: usize = 0;
    while (i <= 10) : ({ i += b; }) {
        continue;
        const b = 42;
    }
}

kristate added a commit to kristate/zig that referenced this issue Aug 28, 2018
…>scope` and not just the immediate parent;
kristate added a commit to kristate/zig that referenced this issue Aug 28, 2018
kristate added a commit to kristate/zig that referenced this issue Aug 28, 2018
kristate added a commit to kristate/zig that referenced this issue Aug 28, 2018
@rohlem
Copy link
Contributor

rohlem commented Aug 28, 2018

(edit2: I realized some error in my understanding/reasoning. See 2 comments down.)
(edit: Sry this became so long, you can just skim over it and ignore my rambling.)
IMHO, deciding between the two alternatives presented currently, I'd prefer the non - special-casing status quo, for language simplicity and consistency. I see the real-world problem this enhancement would solve however, so I almost feel until it is solved, or some idiomatic workaround like "declaring the variables with initial value undefined before the loop" is agreed upon, it would be better to leave this issue open, to reflect this weak point / - of contention.


To productively add to the discussion, I just thought of a counter-proposal not yet on the table:

Introducing a new kind of "continue defer", specifically for while loops.

I'm not 100% on this, but afaik the current defer and errdefer trigger when leaving a scope, so they do not affect the behaviour of continue.** This new defer will trigger in exactly that instance.
To illustrate its behaviour, see this contrived example in replacement status quo pseudo-code (untested):

var i = 10;
while(i > 0) : (i -= 1) {
 { //anonymous block so defer-s cling onto this and not our while loop body
  if(i > 5) continue;
  var additional_countdown = f1(i); // some computation
  defer i -= additional_countdown; //using `contdefer` we could get rid of the anonymous block
 } //note that without labelling the anonymous block, we can even still break from the loop without labelling it.
}

Now that I look at it in clear light, this seems a perfectly fine "idiomatic workaround" that works in the language status quo. Neat.
So, introducing a new "continue defer" keyword (contdefer, or maybe loopdefer? iteratedefer, itdefer, iterdefer?) would remove the need for that anonymous block and still enable the use of defer and errdefer within the loop with status quo "execute when leaving the loop" semantics**.

Pros:

  • Already pretty much doable in status quo (if I didn't make some mistakes on the semantics). Therefore probably very simple to implement (could be done with C-style macros in userspace even), because all the building blocks are already implemented.
  • Consistent: Similar pattern to existing defer family. I think a specific addition for a specific use case is a valid proposition.

Cons:

  • Already pretty much doable in status quo (see example). It might be seen as too specific to implement, given that it's so simple (litterally adding {, } and whitespace). Then again, it would seem, nobody has thought of it until now, lol.

Taking it a step further (too far?): Removing continue expressions.

I don't think this will actually be considered, but hear me out maybe.
The whole point of the continue expression is to place iteration advancement next to the loop condition. From a language-grammatical standpoint it isn't reflected at any other point in the language, and looks... well, you need to get used to it.
Or not. Practically, they do exactly the same thing as a defer inside a nested block, or as the proposed contdefer. So writing a while loop without them isn't really a challenge:

var i = 10;
while(i > 0) { { defer i -= 1;
   //(...)
 } }

Looking at it from that perspective, I honestly propose considering to get rid of them altogether.


** Question/Issue?: Defer/Errdefer in loop bodies

I didn't check this, but I cannot think of a non-surprising behaviour: What happens with defer and errdefer in loop bodies?

  • Are they appended to a list on every iteration they are executed, dispatched when leaving the loop?
  • Are they flagged as "passed" the first time they are executed, to then dispatch in order of first passing?
  • Is it unchecked/undefined/untested behaviour, so nobody knows? (my working hypothesis)
  • Are defer/errdefer prohibited inside loop bodies?
  • Some other behaviour?

I didn't check which of the answers is correct, and I suspect a discussion on which _should be correct is imminent.
Ooh, another question, do they trigger before entering the else branch of a loop? Or after exiting both branches? ... Maybe I should stop.

The argument _could be made that when executing continue, the loop body is "re-entered", and so defer should trigger every time the loop is restarted. That would effectively mean "defer in loops becomes loopdefer".
Would be an elegant implementation/solution, now that I think of it. [Proposal]? But only if everyone is on board...
I'm expecting some counter-arguments on that 300+ word essay just now.

@rohlem
Copy link
Contributor

rohlem commented Aug 28, 2018

re /me - the last message makes some valid analogies but maybe doesn't quite hit the point of this thread. So to sum it up more usefully (and quite opinionated):

  • Continue expressions can be seen as a specialized form of defer. Differentiating nomenclature of "continue expression" and "continue statement" shouldn't be this hard.
  • Using variables in a continue expression before declaration is similar to using variables in a defer statement before declaration.

But then again, that's kind of the whole point of this thread. So maybe I'm still missing the point a bit.
Joking proposal: Runtime-composable block and goto to supersede defer.

@rohlem
Copy link
Contributor

rohlem commented Aug 28, 2018

I just realized my mistake. defer and errdefer only work on the most immediate scope because that solves branching ambiguities. So that means half of my proposal, "just use the semantics of defer", doesn't work because status quo defer also cannot properly handle branching. Now I feel stupid.
Maybe those analogies will still help finding a consensus. Maybe not. The argument could be made that defer should be extended to those usages, but now I understand how it's clearly out of its intended scope. Sry I took so long.

@kristate
Copy link
Contributor

@rohlem Thanks for putting in the thought. from my perspective, continue expression should share the same scope as the while body because it's execution happens at all end branches of the while body loop. If in such case that variables are referenced that do not exist in all branches, the compiler catches this and I return the following error: error: cannot reference variable from unreachable code region . Not sharing the same scope as the while body is weird.

@rohlem
Copy link
Contributor

rohlem commented Aug 28, 2018

Update: I've now also realized the difference in behaviour between continue expressions and defer statements in an inner block: defer will be triggered by break.

var i = 0;
while(i < 10) : (i = advance_it(i)) {
 if(should_break(i)) break; // advance_it won't be executed
}

vs

var i = 0;
while(i < 10) { { defer i = advance_it(i);
  if(should_break(i)) break; //advance_it will still be executed
} }

In many cases the difference will be negligible, but it can lead to subtle mistakes (the worst kind of mistakes). So normal defer doesn't quite cut it.

@thejoshwolfe
Copy link
Contributor

continue expression should share the same scope as the while body

I don't think "scope" is very well defined here. Back when we had goto in the language, the definition of a scope was actually a little different from what appears as curly braces and indentation. Every local variable declaration introduced a new nested scope. This model made it clear that you can't use goto to jump past a variable declaration, because that would effectively be jumping down into an inner scope without entering it properly. We already know that's not allowed for things like jumping into a loop or switch statement, so everything made sense.

As far as i know, variable declarations still create scopes in the compiler, which means the "while body scope" is actually not well defined. The while body is a series of nesting scopes corresponding to each variable declaration. If you define the continue expression to exist in the last scope, it would forbid you from doing any continue statement before any other variable declaration, regardless of whether that variable was referenced in the continue expression or not. This is definitely not what we want.

I haven't studied your code, but you must have put in a check for referencing a variable without properly declaring it. That's not how local variable references work elsewhere in the compiler, which suggests that this is a bad idea.

If we run with this proposal, we're going to need an answer for this question: can you jump to a continue expression before a variable is declared as long as the variable is not used in the continue expression as determined by comptime analysis? Here's an example:

while (foo()) : (if (comptime is_windows) print(handle)) {
    if (conptime !is_windows) continue;
    var handle = bar();
}

Writing a specification for the language's behavior in this case would not be pleasant. This feature seems too weird.

@kristate
Copy link
Contributor

if foo() returns true it loops forever (regardless if is_windows is true or false)
if foo() returns false it passes (regardless if is_windows is true or false)

I guess I need to know what variables need to or are expected to be set.

const builtin = @import("builtin");
const std = @import("std");
const is_windows = builtin.os == builtin.Os.windows;

fn foo() bool {
    return false;
}

fn bar() u32 {
    return 42;
}

test "josh" {
    while (foo()) : (if (comptime is_windows) std.debug.warn("{}\n", handle)) {
        if (comptime !is_windows) continue;
        var handle = bar();
    }
}

@andrewrk andrewrk reopened this Aug 28, 2018
@andrewrk andrewrk removed the rejected label Aug 28, 2018
@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Aug 28, 2018
@kristate
Copy link
Contributor

Here is another test:

const builtin = @import("builtin");
const std = @import("std");
const is_windows = false; //builtin.os == builtin.Os.windows;

fn foo(i: usize) bool {
    return i < 10;
}

fn bar() u32 {
    return 42;
}

test "josh" {
    var i: usize = 0;
    while (foo(i)) : (if (comptime is_windows) std.debug.warn("{}\n", handle)) {
        i += 1;
        if (comptime !is_windows) continue;
        var handle = bar();
    }
}

returns

Test 1/1 josh...OK
All tests passed.

const builtin = @import("builtin");
const std = @import("std");
const is_windows = true; //builtin.os == builtin.Os.windows;

fn foo(i: usize) bool {
    return i < 10;
}

fn bar() u32 {
    return 42;
}

test "josh" {
    var i: usize = 0;
    while (foo(i)) : (if (comptime is_windows) std.debug.warn("{}\n", handle)) {
        i += 1;
        if (comptime !is_windows) continue;
        var handle = bar();
    }
}

returns

Test 1/1 josh...42
42
42
42
42
42
42
42
42
42
OK
All tests passed.

@andrewrk
Copy link
Member Author

Here is an example of surprising code snippets if we accept this proposal:

test "while cont" {
    var i: usize = 0;
    while (i < 10) : (i += x) {
        if (true) {
            const x = 1;
            continue;
        }
    }
}
/Users/andy/dev/zig/build/test.zig:3:28: error: use of undeclared identifier 'x'

Meanwhile these both work:

test "while cont" {
    var i: usize = 0;
    while (i < 10) : (i += x) {
        const x = 1;
        continue;
    }
}
test "while cont 2" {
    var i: usize = 0;
    while (i < 10) : (i += x) {
        {
            const x = 1;
            continue;
        }
    }
}

And here's an example of how the proposed implementation is unsound:

test "while cont" {
    var i: usize = 0;
    while (i < 10) : (i += x) {
        var runtime_true = true;
        if (runtime_true) {
            continue;
        }
        var x: u32 = 1;
    }
}

With the proposed implementation, this test case uses the value x before it is declared, causing undefined behavior (which is an infinite loop on my computer right now).

Now I want to point something important out. I didn't know about this code example until I fortunately thought of it right now. But I sensed that something was amiss early on:

trying to recover the proposal taking this into account starts to get into Too Complicated territory, so I'm going to reject this proposal.

In a world of Unknown Unknowns, Zig's saving grace is resisting the urge to make things more complicated, and keeping things very simple, even if it's inconvenient sometimes.

I think this sense of avoiding things that start to get Too Complicated is really important, and possibly the main attraction of Zig.

I know it's been frustrating for you, since you've made multiple pull requests and had them rejected. But we absolutely must keep the language simple.

@thejoshwolfe
Copy link
Contributor

Without this proposal, it's possible to define the semantics of the continue expression through syntactic desugaring. In general, the following is always true, where COND, EXPR, and BODY represent any syntactic construct:

while (COND) : (EXPR) {
    BODY
}
// is equivalent to:
while (true) {
    if (!(COND)) break;
    body: {
        BODY // and any `continue` in BODY is replaced by `break :body`
    }
    EXPR
}
// or with only if and goto:
entry:
    {
        if (!(COND)) {
            goto :break_;
        }
        {
            BODY // and replace `continue` with `goto :continue_`
        }
continue_:
        EXPR
        goto :entry;
    }
break_:

(goto isn't in the language, but it illustrates how the high-level concepts are precisely definable in low-level terms.) Importantly, this desugaring still works with respect to variable declaration scopes.

The proposal in this issue makes this desugaring not work. I'm not sure it would even be possible to define the continue expression with desugaring. You can't take away the {} around BODY, or else too many continue statements will jump past some variable declaration and be invalid.

I'm not sure this helps explain my objection to the semantics proposed here. It's good for language design when high-level control flow constructs are precisely definable in low-level terms.

@kristate
Copy link
Contributor

22:39 <kristate> hmm, if we change var runtime_true = true; to const runtime_true = true; 
22:39 <kristate> we get the reference 
22:40 <kristate> cannot reference variable from unreachable code region 
22:40 <kristate> andrewrk: I think there maybe something more sinister happening 
22:41 <kristate> let me work on your test case -- it should be returning an error 
22:41 <kristate> the fact that zig isn't catching that is frightening. I am guessing that it is initing to 0 in the case of var 
22:42 <kristate> might be interesting to print out x 
22:43 <kristate> andrewrk: yeah, just tried to print it out and its 0 
22:43 <andrewrk> that's uninitialized memory 
22:44 <kristate> yeah, but why isn't zig throwing an error? 
22:44 <andrewrk> zig does catch that. if you run master branch zig on this example it's a compile error 
22:44 <kristate> no no, that is only because of a scope difference 
22:44 <andrewrk> scopes is how zig figures out if variables are declared or not 
22:44 <andrewrk> each variable declaration starts a new scope 
22:47 <kristate> hmm, let me play with it for 24 hours and if I can't do something about this, we can reject 
22:47 <andrewrk> sounds good, let's revisit in 24 hours 
22:47 <kristate> I am worried that down the line another problem may pop up 

@andrewrk
Copy link
Member Author

Small, simple language. Focus on debugging your application rather than debugging your knowledge of your programming language.

@andrewrk andrewrk modified the milestones: 0.4.0, 0.3.0 Sep 28, 2018
daurnimator added a commit to daurnimator/zig that referenced this issue Aug 5, 2019
I also moved `ncmd` into a shorter scope.
daurnimator added a commit to daurnimator/zig that referenced this issue Oct 26, 2019
I also moved `ncmd` into a shorter scope.
daurnimator added a commit to daurnimator/zig that referenced this issue Nov 26, 2019
I also moved `ncmd` into a shorter scope.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants