Skip to content

while + labeled switch = foot-gun #23482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TaylanUB opened this issue Apr 6, 2025 · 1 comment
Closed

while + labeled switch = foot-gun #23482

TaylanUB opened this issue Apr 6, 2025 · 1 comment

Comments

@TaylanUB
Copy link

TaylanUB commented Apr 6, 2025

I lost my mind trying to figure out this bug today. Code has been simplified to get to the point; hope I didn't add any new bugs to it that distract from what I'm trying to demonstrate:

/// Called after we consumed the opening double-quote of a string.
fn parseQuotedString() !StringLiteral {
    while (readChar()) |c| sw: switch (c) {
        '"' => return getAccumulatedString(),
        '\\' => switch (readChar()) { // Handle backslash-escapes
            'n' => addToString('\n'),
            'r' => addToString('\r'),
            // (and so on)

            // Backslash-escaped literal newline causes that newline, and any
            // following indentation, to be swallowed and ignored.
            '\n' => {
                while (readChar()) |c2| switch (c2) {
                    '\t', ' ' => {}, // ignoring indentation...
                    else => {
                        // We just read the next non-indent char; continue with
                        // it by going right back to the outer switch, jumping
                        // OVER the readChar() so we don't over-consume.
                        continue :sw c2;
                    },
                }
            },
            else => return error.InvalidBackslashEscape,
        },
        // Not the closing double-quote and also not a backslash-escape
        else => addToString(c),
    };
    return error.UnclosedStringLiteral;
}

In case the comments don't make it clear, this is supposed to be able to parse a string literal, allowing it to be broken up into multiple lines by backslash-escaping the newline, so the newline and any indentation that follows are swallowed and ignored.

Example input in an imaginary programming language using this string literal parser:

print("foo \
       bar");

That is supposed to be equivalent to:

print("foo bar");

Can you spot the bug in the implementation?

To give you a hint, the parsed string ends up being foo \ar instead of foo bar.

Here's the explanation:

In the final clause of the labeled switch, in the expression addToString(c), the variable c does not refer to the value that was passed to the switch via continue :sw c2. It refers to the |c| capture from the while loop, which still has its old value when the labeled switch continues with a new value.

While this is technically correct behavior, since there is no special interaction between the while loop variable capture and the subsequent labeled switch (they are working, independently, as intended), I think it's extremely easy to misread the code and fail to see the bug. Which is why I'm calling this a foot-gun. One first has to encounter it and suffer from it at least once, before one becomes primed against it and doesn't repeat the mistake.

Could the language do something against this?

A list of possible improvements:

  1. This proposal, though rejected, would improve on the issue by allowing to eliminate the outer capture, encouraging putting it after else => instead: Bare Switch Capture #20249 Maybe it can be reconsidered in light of this new foot-gun that labeled switch brought into the language.

However, it should be noted that it doesn't eliminate the foot-gun, because it still allows the outer capture if that variable is then used within the body, which it is in the bugged code.

  1. Another option would be to force labeled switch to always use a capture for the value it switches on, like so:
label: switch (<EXPR>) |<ID>| { ... }

This means you can't accidentally ignore the value passed from continue :sw expr, but it could be annoying if you don't need that because you prefer to capture it within any prongs that are multi-valued.

  1. It could be made an error when the <EXPR> in label: switch (<EXPR>) <BODY> is an identifier expression AND that identifier is also referred to somewhere within <BODY>, because you never really need that anyway: You can always capture it with => |<ID>| in the prongs. But this would be kind of a special-case error just to prevent possible bugs. Then again, the language already does that for some things, like unused variables and mutable variables that are never mutated.

Thoughts? Big enough of a foot-gun to warrant a change to the language? Or just a skill issue? 😄

@mlugg
Copy link
Member

mlugg commented Apr 6, 2025

Please do not file a proposal to change the language.

It's not justifiable to completely change up the switch syntax just to handle a misuse of an advanced control flow construct.

The simple fix here would just be to capture the byte in the else case. However, the real problem here is that you've ended up writing code with incredibly confusing control flow. There's only so much Zig can do to stop you from writing confusing code!

A better way to write this would probably be with a FSA (/ FSM / DFA / whatever you want to call it); maybe something like this:

const State = enum { normal, escape, skip_whitespace };
state: switch (State.normal) {
    .normal, .skip_whitespace => |state| switch (readChar() orelse return error.UnterminatedStringLiteral) {
        '"' => return getAccumulatedString(),
        '\\' => continue :state .escape,
        '\t', ' ' => |c| {
            if (state != .skip_whitespace) addToString(c);
            continue :state state;
        },
        else => |c| {
            addToString(c);
            continue :state .normal;
        },
    },
    .escape => {
        const char = readChar() orelse break :state;
        switch (char) {
            'n' => addToString('\n'),
            'r' => addToString('\r'),
            '\n' => continue :state .skip_whitespace,
            else => return error.InvalidBackslashEscape,
        }
        continue :state .normal;
    },
}

@mlugg mlugg closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants