Skip to content

Factor block_cascade out of tableswitch #480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AndrewScheidecker opened this issue Nov 25, 2015 · 13 comments
Closed

Factor block_cascade out of tableswitch #480

AndrewScheidecker opened this issue Nov 25, 2015 · 13 comments
Milestone

Comments

@AndrewScheidecker
Copy link

This idea isn't fully baked, but I thought I'd kick it out here and see what people think.

There are two components to tableswitch: a branch to a target taken from a table+index, and a set of cases that introduce special targets that may be referenced by the table. As I understand it, the purpose for the second component is to avoid highly nested cascades of AST nodes that would otherwise be necessary to represent the a C-style switch with many cases.

I think the second component could be split into a separate AST node: block_cascade. It would look kind of like this:

(block_cascade
    (tableswitch Index... (table #case0 #case1 #case2) #default)
    #case0
    A...
    #case1
    B...
    #case2
    C...
    #default
    D...
)

Which could be desugared to this:

(block #default
    (block #case2
        (block #case1
            (block #case0
                (tableswitch Index... (table #case0 #case1 #case2) #default)
            )
            A...
        )
        B...
    )
    C...
)
D...

One benefit to this is that it would be much easier to produce block_cascade by pattern matching than a tableswitch with cases: it can ignore the contents of the A, B, C, D syntactic variables above. However, some of that benefit could be captured by making the existing tableswitch cases desugar into the block cascade so it doesn't need to worry about whether those blocks-to-be-case-ified are used as generate branch targets.

@rossberg
Copy link
Member

Regular labels are only in scope inside the expression they name. Case labels are very different beasts, they are only known to the respective tableswitch statement enclosing them. Your suggestion would require reconciling these two notions somehow, and allow referencing expression labels from the outside. It is not clear what the rules for this should be without losing well-structuredness of control flow. Even if we found such rules, I'm pretty sure they'd be more complicated and costly to check than the current design, which guarantees well-structuredness by construction.

@AndrewScheidecker
Copy link
Author

That difference between a tableswitch's cases and a cascade of blocks is what makes it harder to transform the latter to the former.

The goal here is that the block_cascade has the same semantics as a cascade of nested blocks in terms of label visibility, so it maintains the structured control flow property. The difference is that it doesn't reflect that nesting in the syntax structure.

@sunfishcode
Copy link
Member

@rossberg-chromium The rule for this to preserve well-structuredness of control flow is: these labels can only be referenced from within the block_cascade above their definition. This can be proven by examining the effective rules that would be imposed by the equivalent stack of nested blocks.

I can confirm that CFG-based producers would find block_cascade convenient to use.

On the consumer side, I imagine we could design the binary enoding such that the opening of a block_cascade declares how many labels it has, conceptually pushing that many entries on the control flow stack, and each label definition pops its entry. That way, references to labels can use the same "depth in the control flow stack" concept that we already have, so this seems like it should be easy to implement.

My main reservation with this feature is that while it's less convenient for producers to run an algorithm to do block placement, it can sometimes produce code with lower maximum stack depths (eg. { { } { } } has a lower max depth than { { { } } }), but I don't yet know how much weight to place on this.

@AndrewScheidecker
Copy link
Author

My main reservation with this feature is that while it's less convenient for producers to run an algorithm to do block placement, it can sometimes produce code with lower maximum stack depths

I think this is complementary to block_cascade if you produce blocks the same as you would otherwise, and then do an optimization pass that collapses the cascading block pattern to a single node.

@kg
Copy link
Contributor

kg commented Nov 25, 2015

I'd like to strongly encourage people to think carefully about the impact that significant tree restructurings like this will have on executable size.

@rossberg
Copy link
Member

@sunfishcode, right, makes sense.

But it seems you could break this up into two independent proposals:

  1. Simplify tableswitch to become just an indexed break. I kind of like that from a semantic perspective, as it is a simplification and eliminates the special status of case labels.
  2. Introduce a shorthand for nested blocks. The benefit of that is less clear to me. AFAICT, the difference shouldn't matter much for either producers or consumers (@sunfishcode, wouldn't you need to manage some form of stack of similar size either way?). Is the impact on binary code size significant?

@sunfishcode
Copy link
Member

@rossberg-chromium When I was proposing a pure indexed break earlier, one of the complications that was brought up is that tableswitches can sometimes have thousands or more targets, and stacking up blocks to that level of nesting might create stack overflow trouble for recursive parsers. Defining a block_cascade at the same time would alleviate this.

(A related concern is the asm.js polyfill; existing JS parsers don't expect super-deep nesting, however I wonder if in the polyfill timeframe, we can make a convention that if you want to use the polyfill, you only using block_cascade with tableswitch, and then always putting the tableswitch at the very top of it. That might make it reasonable for the polyfill to detect the pattern and emit a switch. I'm confident I can make LLVM emit code like this, I'll talk with others about polyfill side.)

Concerning code size, tableswitch is very infrequent relative to other operators, so it doesn't contribute significantly to code size no matter what we do here.

Finally, @AndrewScheidecker or anyone else, what do you think of the name multiblock for this construct?

@kripken
Copy link
Member

kripken commented Nov 28, 2015

I like the name multiblock.

But regarding code size, if block_cascade/multiblock is factored out of tableswitch, then it might be used more often than tableswitch, and the impact on code size is unclear. Or if the intention is that the factored-out construct never be used outside of a tableswitch, then obviously code size is not impacted exactly as you said (since tableswitch is rare), but then I'm not sure I see the benefit of factoring it out.

@sunfishcode
Copy link
Member

block_cascade/multiblock could easily be smaller than an equivalent stack of nested blocks. It obviously depends on encoding strategy, but block_cascade/multiblock might use single-byte markers to mark where the labels go, making it be just 1 byte per label, while the per-label size of nested blocks might often be greater than 1, because they need an opcode plus a count of how many statements there are, or a marker for where the block ends.

@kripken
Copy link
Member

kripken commented Nov 29, 2015

Agreed. It might also be faster to parse if the multiblock is large (less branching).

@AndrewScheidecker
Copy link
Author

multiblock is ok, but could be ambiguous if there is also a more general CFG node (which I imagine being very similar but with less constrained label scoping).

@sunfishcode
Copy link
Member

I suggest if we introduce a general CFG node, we call it cfg or graph or similar, to emphasize the arbitrary graph nature. To me, multiblock suggests multiple blocks grouped together, which is essentially what it is.

I've now submitted #490 to make an initial attempt at wording for multiblock.

@titzer
Copy link

titzer commented Feb 25, 2016

I think we're going to significantly simply tableswitch to br_table, which obsoletes this issue.

@titzer titzer closed this as completed Feb 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants