Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Change try ... catch + br_on_exn to br_on_catch{,_else}? #80

Closed
dead-claudia opened this issue Jun 12, 2019 · 10 comments
Closed

Change try ... catch + br_on_exn to br_on_catch{,_else}? #80

dead-claudia opened this issue Jun 12, 2019 · 10 comments

Comments

@dead-claudia
Copy link

dead-claudia commented Jun 12, 2019

Edit: Redid a couple things.

After reading #58 and that eventually getting merged in #71, I feel you could remove try ... catch ... end altogether and simplify the branching generation a lot by just unifying try ... catch and br_on_exn with a generic br_on_catch len ($lbl $id)+ and br_on_catch_else len ($lbl $id)* $default_lbl that operate more like a br_table for exceptions. The label of the corresponding block for $default_lbl must be a (result except_ref), but the rest just need to match the corresponding exception. And except_ref can only be plugged into a rethrow, manipulated as an opaque parameter, local, or result, or dropped.

The br_on_catch is sugar for br_on_catch_else, just with the default branch always rethrowing. (This is very commonly the case, so I felt it was worth including.)

This would make the corresponding grammar look like this:

br_on_catch len (label except_index)+ |
br_on_catch_else len (label except_index)* label |
throw except_index |
rethrow

And of course, this makes the code a lot smaller with no loss in power. (You can always organize your blocks to have shared logic as appropriate - it's roughly the same amount of code, and you'd likely need to do it anyways in the case of C++ exception handling.)

@dead-claudia dead-claudia changed the title Change try ... catch to br_on_throw? Change try ... catch + br_on_exn to br_on_throw{,_else}? Jun 12, 2019
@dead-claudia dead-claudia changed the title Change try ... catch + br_on_exn to br_on_throw{,_else}? Change try ... catch + br_on_exn to br_on_catch{,_else}? Jun 12, 2019
@dead-claudia
Copy link
Author

dead-claudia commented Jun 12, 2019

Edit: forgot a backtick.

BTW, this also mirrors much more closely the code generated by compilers for lightweight exception handling, so they wouldn't necessarily need to alter their code generation for that as much, and it'd be a little easier to stream. (99% of the setup code in practice is on the try and throw, with little if any with catch.) It also more closely mirrors setjmp + longjmp behavior, and here's how that could be implemented:

  • jmp_buf env would be a pointer to a unique ID.
  • setjmp would set env's value to a globally unique ID and invoke br_on_exn $label $setjmp, where $label refers to the subsequent block. All future statements would be encapsulated in a loop that's continued only if longjmp was called with that ref, broken if it wasn't and rethrowing if it was called with the wrong ID.
  • longjmp would just execute (throw $setjmp (i64.load align=0 $env))

That setjmp would be compiled to something like this, corresponding to at least 24 bytes of overhead assuming br_on_throw is set to 0x06 and br_on_throw_else is set to 0x07 (like try and catch are respectively currently):

0x03 0xRR ;; loop $loop (result ...)
  0x02 0x7E ;; block $target (result i64)
    0x06 0x01 0x00 0xSS ;; br_on_catch ($target $setjmp)
    ... ;; subsequent instructions
    0x0C 0x01 ;; br $loop
  0x0B ;; end
  0x22 0xTT ;; local.tee $temp_id
  0x41 0xNN ;; i32.const $label_id
  0x46 ;; i64.ne
  0x04 0x40 ;; if (result)
    0x20 0xTT ;; local.get $temp_id
    0x08 0xSS ;; throw $setjmp
  0x0B ;; end
0x0B ;; end

For comparison, it might be compiled like this with try ... catch ... end, corresponding to at least 27 bytes of overhead, overhead that's harder to compress:

;; and in the function
0x03 0xRR ;; loop $loop (result ...)
  0x02 0x7E ;; block $target (result i64)
    0x06 ;; try
      ... ;; subsequent instructions
      0x0C 0x01 ;; br $loop
    0x07 ;; catch
      0x0A 0x00 0xSS ;; br_on_exn $target $setjmp
      0x09 ;; rethrow
    0x0B ;; end
  0x0B ;; end
  0x22 0xTT ;; local.tee $temp_id
  0x41 0xNN ;; i32.const $label_id
  0x46 ;; i64.ne
  0x04 0x40 ;; if (result)
    0x20 0xTT ;; local.get $temp_id
    0x08 0xSS ;; throw $setjmp
  0x0B ;; end
0x0B ;; end

And one other thing: it allows stuff like throw new Error() inline inside a try/catch to be coded into a simple allocate (if necessary) + br, with very little ceremony and virtually zero-cost.

@rossberg
Copy link
Member

I don't understand. How would exceptional control flow reach br_on_catch? What is the dynamic scope of this handler?

@dead-claudia
Copy link
Author

@rossberg It wouldn't, but it wouldn't make sense for it to. Think of it like setting up traps for exceptions within a block, and the labels are just what blocks to break from when that ID is caught. The scope is specifically limited to the block it's in, and exceptions are only trapped after that instruction executes.


As an alternative, I considered using br_on_catch $label $except_id and just having that repeated for each exception ID you want to trap, with br_on_catch_all $label for the catch-all. In this scenario, the last br_on_catch* wins, so you'd want your br_on_catch_all as the first instruction. This would probably simplify the spec and implementation a bit, at the cost of slightly increased uncompressed code size when multiple IDs are involved, but 1. I suspect this will be reasonably rare and 2. I expect enough consistency in exception code that compressors would nullify any gains, so this is probably the better route.

@titzer
Copy link
Contributor

titzer commented Jun 12, 2019

@isiahmeadows What you are describing is basically try...catch, the former of which declares a block which is the lexical scope for the catch. It's nice to have blocks with catch handlers be denoted up-front, which is exactly what a try is. In a previous iteration of this proposal, the catches were labeled with exception ids. We found it difficult to support factoring this exception-type dispatch code out, so we found it much simpler to add the basics (i.e. just catch and first-class exceptions), so that it was easier for "user level" factoring of shared exception handlers.

@dead-claudia
Copy link
Author

@titzer Yeah, true. I was just thinking it could be a little more explicit and lower-level, but yes, it's entirely isomorphic.

We found it difficult to support factoring this exception-type dispatch code out, so we found it much simpler to add the basics (i.e. just catch and first-class exceptions), so that it was easier for "user level" factoring of shared exception handlers.

Have you considered making exception IDs entirely dynamic and just putting the onus on the user to sort them out, just making it a generic mostly-untyped catch (not unlike in JS)? My proposal would then just be a simple br_on_catch $label, and the block target would need to receive a i64 i32 where the top of the stack is the conceptual ID and the second item is the conceptual value. rethrow could disappear as would "catch this ID" instructions, and it'd just be a two-parameter (throw id value). Compilers already know how to implement the requisite glue code for native compilation, and the stack unwinding mechanism itself is the only primitive that can't already be implemented in userland with similar performance.

When compiled, a use of setjmp might look like this (39 bytes of overhead):

;; In the type section, at type index 0xRR
0x60 0x00 0x20 0x7E 0x7F ;; type $t (params) (result i64 i32)

;; In the code
0x03 0xLL ;; loop $loop (result ...)
  0x02 0xRR ;; block $target $t
    0x06 0x00 ;; br_on_catch $target
    ... ;; subsequent instructions
    0x0C 0x01 ;; br $loop
  0x0B ;; end
  0x22 0xTT ;; local.tee $temp
  0x41 0xSS ;; i32.const $setjmp
  0x46 ;; i64.ne
  0x04 0x40 ;; if (result)
    0x20 0xTT ;; local.get $temp
    0x08 ;; throw
  0x0B ;; end
  0x22 0xTT ;; local.tee $temp
  0x41 0xNN ;; i64.const $label_id
  0x46 ;; i64.ne
  0x04 0x40 ;; if (result)
    0x20 0xTT ;; local.get $temp_id
    0x41 0xSS ;; i32.const $setjmp
    0x08 ;; throw
  0x0B ;; end
0x0B ;; end

And of course, longjmp would be rather simple, consisting of 3 bytes (my original snippet was wrong here):

... ;; `$env` is at top of stack.
0x41 0xSS ;; i32.const $setjmp
0x08 ;; throw

Alternatively, you could stick with try ... catch ... end and just abandon br_on_exn + rethrow. A use of setjmp might look like this in that route (32 bytes of overhead):

0x03 0xLL ;; loop $loop (result ...)
  0x06 ;; try
    ... ;; subsequent instructions
    0x0C 0x00 ;; br $loop
  0x07 ;; catch
    0x22 0xTT ;; local.tee $temp
    0x41 0xSS ;; i32.const $setjmp
    0x46 ;; i64.ne
    0x04 0x40 ;; if (result)
      0x20 0xTT ;; local.get $temp
      0x08 ;; throw
    0x0B ;; end
    0x22 0xTT ;; local.tee $temp
    0x41 0xNN ;; i64.const $label_id
    0x46 ;; i64.ne
    0x04 0x40 ;; if (result)
      0x20 0xTT ;; local.get $temp_id
      0x41 0xSS ;; i32.const $setjmp
      0x08 ;; throw
    0x0B ;; end
  0x0B ;; end
0x0B ;; end

This, with dynamic IDs entirely specified by userland, would itself also allow very efficient exceptions for OCaml and other languages with lightweight exceptions: they could set the "ID" to the type value, the "value" to the instance data, and throw it with near zero overhead. And the engine could support that with a custom calling convention that makes exceptions sufficiently cheap (like what OCaml's native compiler does) - the calling convention of JIT-compiled WebAssembly bytecode is normally an implementation detail.

@binji
Copy link
Member

binji commented Jun 16, 2019

@isiahmeadows How would you be able to catch exceptions from the embedder in this scheme? And also hold and release references to these exception objects?

@dead-claudia
Copy link
Author

@binji Embedders could just throw and catch exceptions with special IDs and values of their own, more or less the same way userland code would. Of course, this could conflict with existing user IDs, but it also allows easy integration with them even from embedder code. (And compilers could offer a mechanism to remap native exceptions of particular IDs to userland exception types, if necessary. This would also inform the compiler what IDs they have to avoid internally.) As for holding and releasing references, embedders could just provide an API to free a value pointer if it requires allocation of some kind, and userland can just store it (or its contents) in allocated linear memory, locals, or similar.

The exception value may need altered to be an exn_value_ref <: any_ref wrapping a possible exception value, with appropriate optional casts to/from i64 (with an else block invoked if not a direct value, like an embedder-provided pointer, of course) + an ability to free embedder references in general. But that's mostly isomorphic and just provides some built-in functionality to make safe code easier and lower-overhead to write.

@rossberg
Copy link
Member

The problem that generative exception IDs solve is that a language cannot confuse one of its exceptions with that from another language runtime (or another instance of the same runtime). Keep in mind that Wasm is an open, heterogeneous system, such that execution may mix multiple different languages compiled to Wasm.

@dead-claudia
Copy link
Author

Fair.

@dead-claudia
Copy link
Author

I'll close this for now, and I might revisit it later if I come up with a better suggestion that's a little less radical.

ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
The data count section has a count that must match the number of data segments. If the data count section isn't present, then `memory.init` and `data.drop` cannot be used.

Fixes issue WebAssembly#73.
ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants