-
Notifications
You must be signed in to change notification settings - Fork 472
Type-check dead code and introduce assert_soft_invalid #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In typecheck.wast, in $type-br-operand-missing-in-loop, the end of the function is unreachable, so a type error should not be required. |
It would be useful to have a public explanation for the design decision in this PR. |
@sunfishcode, removed offending test, which was a left-over of the 2-label loop. Extended the PR description with a bit of motivation. |
@rossberg-chromium As the maintainer of a code generator, I can report that the new rules require my code generator to reason about reachability more than the previous rules did. There is a practical cost here, to my codebase and likely to others like it. Is there a practical advantage in having engines enforce this particular type-checking system for unreachable code which justifies this cost? I'm not objecting the decision here. I wish to understand it. |
@sunfishcode, by previous rules, you mean not type-checking dead code? IIRC there were concerns about having to accept unvalidated code (you were present in the discussions, I think). Also, it leads to the conceptual complication that we need to define a specific notion of reachability in the spec (with some subtle design space). The current resolution gets rid of that, while engines are free to continue doing what they did. Code generators are free to ignore reachability entirely. AFAICS, they only need to care, naturally, if they want to do dead code elimination. But even in that case they should be able to use whatever approximation they see fit, instead of having to match one made up by the spec. Or so I'd hope. Can you elaborate what specifically got more complicated on your end? |
Would be curious how the following is not invalid?
Is this valid only because the What about:
Above the fall-through might not have a correctly typed value on the stack at the function block fall-through. Does the producer need to insert a dummy fall-through value. Also another challenge, does
|
@JSStats, your first and third example are valid because The second example is invalid, because the loop does not have type |
@rossberg-chromium How does "push as many as the number of values consumed" work when we add What about |
@JSStats, The |
@rossberg-chromium Thank you for engaging. If the If so then the The |
@JSStats, It's the same as in the JVM, for example (though newer JVMs will require giving the assumed stack type explicitly in form of a stack map entry). |
@rossberg-chromium So if "type-checking pretends" does not appear to be a well defined concept? This makes it difficult to understand where this is going. What about The |
@JSStats, if there is a The technical concept here is polymorphism, which is well-defined and well-understood. In a nutshell, after a control flow transfer instruction any suitable type can be assumed for the stack, and typing picks what makes the rest of the instructions type-check. Wasm is simple enough that such inference is straightforward. In the I added a sketch of a type checking algorithm to the original comment. Hope that clarifies. Adding pick is left as an exercise to the reader. :) |
@rossberg-chromium Thank you for the detailed explanation. However seeing this I don't think it is workable, does not meet the use case of reconciling with the AST and its data-flow typing, and thus probably not better than simply ignoring dead code apart from skipping over the operators. For example Could you please consider the limits of WebAssembly/design#778 ? Can you see any cases that slip through that proposal, or any show stoppers - it looks like the better solution but it would be good to hear from some experience. |
@JSStats, for better or worse, Wasm is a stack machine now, which is more liberal, so arguing that typing is too permissive for an AST is besides the point. The basic scheme of not restricting stack types after branches has precedence in both major stack machines, JVM and CLR (the latter with some quirks). |
@rossberg-chromium Sorry, 'stack-machine' is a very general term, not a basis for technical decision making. If 'stack-machine' is going to be used like that to make decisions then all mention of this term needs to be remove from this project. This PR does not even respect the structure constraints of this stack machine which is the cause of the problems, so it's not the 'stack-machine' rather just the approach in this PR. There is a better proposal on the table that does appear to better solve the technical use case of consistency with the AST model, it can be done and is consistent with the 'stack-machine', and this is the wasm way. JVM and CLR have different approaches, whereas wasm is to be presentable in a much more familar text format. If you want to work on CLR then that is a separate project. I have made many appeals to consider the alternative path, and no show stoppers have been presented, it stands as the best path forward. Thus I must object to this PR's path at this point. |
LGTM @JSStats There has been a lot of discussion about typechecking dead code both in the context of the AST and (now) in the context of a stack machine. In general, a stack machine is more general than an AST, so it needs to have typing rules that capture that. As @rossberg-chromium notes, this PR is a compromise between not checking dead code at all, which was considered by many to be a security risk, and requiring that all code typecheck as if it were an AST, which would require the insertion of even more dead code in the general case. IMO the rules here are pretty easy to follow and since they are easily formalized, easier to reason about. I have an older patch that implements them for V8 and will probably land that to match the spec exactly, even though V8 actually has a more sophisticated dead code analysis built in due to its tracking of SSA environments. Other proposed approaches like many dead code syntactically impossible don't serve simple producers well (without solving the actual problem anyway). |
@titzer Lets see your example of wasm 'stack-machine' code that can not be presented in an AST? I have been following the discussions and your claim that 'a stack machine is more general than an AST' is surprising to me, but it's always good to learn more. If you have an example where 'making dead code syntactically impossible don't serve simple producers well' then could you please post it to WebAssembly/design#778 In the absence of such example, it seems only fair that that stands as the best solution. I would also like to hear the binaryen experience. |
E.g. you can have |
@titzer Yes, such examples were discussed and it seemed settled that they have natural familiar text presentations as:
So we can keep this structure stack-machine and the familiar text format, and that seem very worthy of considering in relation to this dead-code matter. Wouldn't it be great if wasm could address this in the dead code too. Some of the rules in the PR might be expressed as in the following example, but perhaps not all, but it requires more complexity in the text format code, looking ahead to consumers of the values.
|
Here's a counter proposal, adopting some of the rules in this PR but also moving us closer to WebAssembly/design#778 and also fully validating the types in unreachable code with one exception.
For example
For example: Where the producer wants to fill a typed operator, such as
This proposal would eliminate the This proposal avoids introducing the Open to other suggestions, problems, etc. |
@JSStats, we want to have a single logical representation that byte code encodes/decodes into, the text format prints/parses into, validation is defined on, and evaluation is defined on. That is not possible with the syntax you sketch: converting between this and the rest represents an actual translation that in one direction even is type-dependent. Conceptually, making that work would require defining two languages and a compile/decompile logic between them. There is no point in doing that rather than just consistently defining a stack machine. And as Ben mentions, proposals like WebAssembly/design#778 would penalise any compiler that does not use a control-flow graph because they require all producers to track reachability. No external code format (as opposed to compiler ILs) that I'm aware of does that, and for good reason. It's of course fine for anybody to invent other languages that Wasm compiles/decompiles outof/into, but that's outside the scope of the Wasm specification. |
lgtm per previous discussion |
Landing based on above LGTMs and earlier discussions. |
Fix handling of recursive types in parser context
As discussed off-line, we want to require valid code to be type-correct even in dead code, but allow engines to skip its validation. That is analogous to the liberty we give engines for validating individual functions lazily (which may include not at all if they are never called).
Edit: The motivation for this design choice is that typing rules should be simple but permissive enough and not require programmers or code generators to consider reachability. At the same time, it can be beneficial for implementations to be able to ignore dead code. The design represents a compromise: we only specify code to be valid if all instructions type-check, whether reachable or not. At the same time, we do not require (but still allow) engines to actually check the validity of code they can prove is never executed. The spec implements full checking, but the test suite distinguishes code that has to be identified as invalid by all engines (
assert_invalid
) and invalid code that they don't need to diagnose because it is unreachable (assert_soft_invalid
).This patch is going back to type-checking of dead code (with unconstrained initial stack), but respective tests use the new
assert_soft_invalid
assertion, which can be turned off for implementations that choose to skip dead code validation.Edit 2: Here is a sketch of a type-checking algorithm that engines can use if they wish to check all unreachable code:
;; Data structures
type value_type = i32 | i64 | f32 | f64
type operand_stack = stack(value_type | unknown)
type control_stack = stack(control_frame)
type control_frame = {
. out_types : array(value_type)
. label_types : array(value_type)
. height : int
. polymorphic : bool
}
;; Stack manipulation
var operands : operand_stack
var controls : control_stack
push_operand(type : value_type | unknown) =
. operands.push(type)
pop_operand(expect : value_type | unknown) : value_type | unknown =
. error_if(operands.size = controls[0].height && not(controls[0].polymorphic))
. if (operands.size = controls[0].height) return expect ;; polymorphic
. let actual = operands.pop()
. if (actual = expect || expect = unknown) return actual
. if (actual = unknown) return expect
. error()
push_operands(types : array(value_type)) =
. foreach t in types
. . push_operand(t)
pop_operands(types : array(value_type)) =
. foreach t in reverse(types)
. . pop_operand(t)
poly_operands() =
. operands.pop(operands.size - control[0].height)
. control[0].polymorphic := true
push_control(out : array(value_type), label : array(value_type)) =
. let frame = new {out_types = out; label_types = label; height = operands.size; polymorphic = false}
. controls.push(frame)
pop_control() : array(value_type) =
. error_if(controls.is_empty())
. let frame = controls.pop()
. pop_operands(frame.out_types)
. error_if(operands.size <> frame.height)
. return frame.out_types
;; Validation of representative cases
validate(
i32.add
) =. pop_operand(i32)
. pop_operand(i32)
. push_operand(i32)
validate(
drop
) =. pop_operand(unknown)
validate(
select
) =. pop_operand(i32)
. let t = pop_operand(unknown)
. pop_operand(t)
. push_operand(t)
validate(
block
t1..tN) =. push_control([t1..tN], [t1..tN])
validate(
loop
t1..tN) =. push_control([t1..tN], [])
validate(
if
t1..tN) =. pop_operand(i32)
. push_control([t1..tN], [t1..tN])
validate(
else
) =. let out_types = pop_control()
. push_control(out_types, out_types)
validate(
end
) =. let out_types = pop_control()
. push_operands(out_types)
validate(
br
N) =. error_if(controls.size() < N)
. pop_operands(controls[N].label_types)
. poly_operands()
validate(
unreachable
) =. poly_operands()