Skip to content

[WebAssembly] Add unreachable before catch destinations #123915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 23, 2025

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Jan 22, 2025

When try_table's catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example:

block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block
... use exnref ...

This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here:

block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
  unreachable                    ;; Newly added
end_block

Because 'unreachable' is a terminator we also need to split the BB.


We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested try_table~end_try_table added to fix a unwind mismatch:

try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table

While the block added for the trampoline BB has the return type exnref, its body, which contains the nested try_table and other code, wouldn't have the exnref return type. Most times it didn't become a problem because the block's body ended with something like br or return, but that may not always be the case, especially when there is a loop. So we add an unreachable to make the code valid here too:

try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
    unreachable                  ;; Newly added
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table

In this case we just append the unreachable at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there end_try_table and end_block were added in the beginning of an EH pad in placeTryTableMarker and moving end_try_table and the new unreachable to the previous BB caused other problems.)


This adds many unreaachables to the output, but this adds unreachable to only a few places to see if this is working. The FileCheck lines in exception.ll and cfg-stackify-eh.ll are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding unreachables everywhere.

When `try_table`'s catch clause's destination has a return type, as in
the case of catch with a concrete tag, catch_ref, and catch_all_ref.
For example:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block
```
... use exnref ...

This code is not valid because the block's body type is not exnref. So
we add an unreachable after the 'end_try_table' to make the code valid
here:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
  unreachable                    ;; Newly added
end_block
```
Because 'unreachable' is a terminator we also need to split the BB.

---

We need to handle the same thing for unwind mismatch handling. In the
code below, we create a "trampoline BB" that will be the destination for
the nested `try_table`~`end_try_table` added to fix a unwind mismatch:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
While the `block` added for the trampoline BB has the return type
`exnref`, its body, which contains the nested `try_table` and other
code, wouldn't have the `exnref` return type. Most times it didn't
become a problem because the block's body ended with something like `br`
or `return`, but that may not always be the case, especially when there
is a loop. So we add an `unreachable` to make the code valid here too:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
    unreachable                  ;; Newly added
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
In this case we just append the `unreachable` at the end of the layout
predecessor BB. (This was tricky to do in the first (non-mismatch) case
because there `end_try_table` and `end_block` were added in the
beginning of an EH pad in `placeTryTableMarker` and moving
`end_try_table` and the new `unreachable` to the previous BB caused
other problems.)

---

This adds many `unreaachable`s to the output, but this adds
`unreachable` to only a few places to see if this is working. The
FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already
heavily redacted to only leave important control-flow instructions, so I
don't think it's worth adding `unreachable`s everywhere.
@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2025

@llvm/pr-subscribers-backend-webassembly

Author: Heejin Ahn (aheejin)

Changes

When try_table's catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example:

block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block

... use exnref ...

This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here:

block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
  unreachable                    ;; Newly added
end_block

Because 'unreachable' is a terminator we also need to split the BB.


We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested try_table~end_try_table added to fix a unwind mismatch:

try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table

While the block added for the trampoline BB has the return type exnref, its body, which contains the nested try_table and other code, wouldn't have the exnref return type. Most times it didn't become a problem because the block's body ended with something like br or return, but that may not always be the case, especially when there is a loop. So we add an unreachable to make the code valid here too:

try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
    unreachable                  ;; Newly added
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table

In this case we just append the unreachable at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there end_try_table and end_block were added in the beginning of an EH pad in placeTryTableMarker and moving end_try_table and the new unreachable to the previous BB caused other problems.)


This adds many unreaachables to the output, but this adds unreachable to only a few places to see if this is working. The FileCheck lines in exception.ll and cfg-stackify-eh.ll are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding unreachables everywhere.


Full diff: https://github.com/llvm/llvm-project/pull/123915.diff

5 Files Affected:

  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (+88-19)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td (+1-1)
  • (modified) llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll (+1)
  • (modified) llvm/test/CodeGen/WebAssembly/exception-legacy.mir (+2-1)
  • (modified) llvm/test/CodeGen/WebAssembly/exception.ll (+1)
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
index 6cae0e766dbc02..da4b8a453632bc 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
@@ -1297,6 +1297,7 @@ void WebAssemblyCFGStackify::addNestedTryDelegate(
 //       some code
 //     end_try_table
 //     ...
+//     unreachable
 //   end_block                      ;; Trampoline BB
 //   throw_ref
 // end_try_table
@@ -1358,6 +1359,13 @@ WebAssemblyCFGStackify::getTrampolineBlock(MachineBasicBlock *UnwindDest) {
   BuildMI(TrampolineBB, EndDebugLoc, TII.get(WebAssembly::THROW_REF))
       .addReg(ExnReg);
 
+  // The trampoline BB's return type is exnref because it is a target of
+  // catch_all_ref. But the body type of the blodk we just created is not. We
+  // add an 'unreachable' right before the 'end_block' to make the code valid.
+  MachineBasicBlock *TrampolineLayoutPred = TrampolineBB->getPrevNode();
+  BuildMI(TrampolineLayoutPred, TrampolineLayoutPred->findBranchDebugLoc(),
+          TII.get(WebAssembly::UNREACHABLE));
+
   registerScope(Block, EndBlock);
   UnwindDestToTrampoline[UnwindDest] = TrampolineBB;
   return TrampolineBB;
@@ -1465,7 +1473,7 @@ void WebAssemblyCFGStackify::addNestedTryTable(MachineInstr *RangeBegin,
       // - After:
       // pre_bb: (new)
       //   range_end
-      // end_try_table: (new)
+      // end_try_table_bb: (new)
       //   end_try_table
       // post_bb: (previous 'ehpad')
       //   catch
@@ -1523,9 +1531,9 @@ void WebAssemblyCFGStackify::addNestedTryTable(MachineInstr *RangeBegin,
 //   end_loop
 //   end_try_table
 //
-// So if the unwind dest BB has a end_loop before an end_try_table, we split the
-// BB with the end_loop as a separate BB before the end_try_table BB, so that
-// after we fix the unwind mismatch, the code will be like:
+// So if an end_try_table BB has an end_loop before the end_try_table, we split
+// the BB with the end_loop as a separate BB before the end_try_table BB, so
+// that after we fix the unwind mismatch, the code will be like:
 // bb0:
 //   try_table
 //   block exnref
@@ -1538,10 +1546,10 @@ void WebAssemblyCFGStackify::addNestedTryTable(MachineInstr *RangeBegin,
 //   end_block
 // end_try_table_bb:
 //   end_try_table
-static void splitEndLoopBB(MachineBasicBlock *UnwindDest) {
-  auto &MF = *UnwindDest->getParent();
+static void splitEndLoopBB(MachineBasicBlock *EndTryTableBB) {
+  auto &MF = *EndTryTableBB->getParent();
   MachineInstr *EndTryTable = nullptr, *EndLoop = nullptr;
-  for (auto &MI : reverse(*UnwindDest)) {
+  for (auto &MI : reverse(*EndTryTableBB)) {
     if (MI.getOpcode() == WebAssembly::END_TRY_TABLE) {
       EndTryTable = &MI;
       continue;
@@ -1555,11 +1563,11 @@ static void splitEndLoopBB(MachineBasicBlock *UnwindDest) {
     return;
 
   auto *EndLoopBB = MF.CreateMachineBasicBlock();
-  MF.insert(UnwindDest->getIterator(), EndLoopBB);
+  MF.insert(EndTryTableBB->getIterator(), EndLoopBB);
   auto SplitPos = std::next(EndLoop->getIterator());
-  EndLoopBB->splice(EndLoopBB->end(), UnwindDest, UnwindDest->begin(),
+  EndLoopBB->splice(EndLoopBB->end(), EndTryTableBB, EndTryTableBB->begin(),
                     SplitPos);
-  EndLoopBB->addSuccessor(UnwindDest);
+  EndLoopBB->addSuccessor(EndTryTableBB);
 }
 
 bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
@@ -1943,8 +1951,16 @@ bool WebAssemblyCFGStackify::fixCallUnwindMismatches(MachineFunction &MF) {
   // When end_loop is before end_try_table within the same BB in unwind
   // destinations, we should split the end_loop into another BB.
   if (!WebAssembly::WasmUseLegacyEH)
-    for (auto &[UnwindDest, _] : UnwindDestToTryRanges)
-      splitEndLoopBB(UnwindDest);
+    for (auto &[UnwindDest, _] : UnwindDestToTryRanges) {
+      auto It = EHPadToTry.find(UnwindDest);
+      // If UnwindDest is the faker caller block, it will not be in EHPadToTry
+      // map
+      if (It != EHPadToTry.end()) {
+        auto *TryTable = It->second;
+        auto *EndTryTable = BeginToEnd[TryTable];
+        splitEndLoopBB(EndTryTable->getParent());
+      }
+    }
 
   // Now we fix the mismatches by wrapping calls with inner try-delegates.
   for (auto &P : UnwindDestToTryRanges) {
@@ -2179,8 +2195,15 @@ bool WebAssemblyCFGStackify::fixCatchUnwindMismatches(MachineFunction &MF) {
 
   // When end_loop is before end_try_table within the same BB in unwind
   // destinations, we should split the end_loop into another BB.
-  for (auto &[_, UnwindDest] : EHPadToUnwindDest)
-    splitEndLoopBB(UnwindDest);
+  for (auto &[_, UnwindDest] : EHPadToUnwindDest) {
+    auto It = EHPadToTry.find(UnwindDest);
+    // If UnwindDest is the faker caller block, it will not be in EHPadToTry map
+    if (It != EHPadToTry.end()) {
+      auto *TryTable = It->second;
+      auto *EndTryTable = BeginToEnd[TryTable];
+      splitEndLoopBB(EndTryTable->getParent());
+    }
+  }
 
   NumCatchUnwindMismatches += EHPadToUnwindDest.size();
   SmallPtrSet<MachineBasicBlock *, 4> NewEndTryBBs;
@@ -2372,6 +2395,48 @@ static void appendEndToFunction(MachineFunction &MF,
           TII.get(WebAssembly::END_FUNCTION));
 }
 
+// We added block~end_block and try_table~end_try_table markers in
+// placeTryTableMarker. But When catch clause's destination has a return type,
+// as in the case of catch with a concrete tag, catch_ref, and catch_all_ref.
+// For example:
+// block exnref
+//   try_table (catch_all_ref 0)
+//     ...
+//   end_try_table
+// end_block
+// ... use exnref ...
+//
+// This code is not valid because the block's body type is not exnref. So we add
+// an unreachable after the 'end_try_table' to make the code valid here:
+// block exnref
+//   try_table (catch_all_ref 0)
+//     ...
+//   end_try_table
+//   unreachable      (new)
+// end_block
+//
+// Because 'unreachable' is a terminator we also need to split the BB.
+static void addUnreachableAfterTryTables(MachineFunction &MF,
+                                         const WebAssemblyInstrInfo &TII) {
+  std::vector<MachineInstr *> EndTryTables;
+  for (auto &MBB : MF)
+    for (auto &MI : MBB)
+      if (MI.getOpcode() == WebAssembly::END_TRY_TABLE)
+        EndTryTables.push_back(&MI);
+
+  for (auto *EndTryTable : EndTryTables) {
+    auto *MBB = EndTryTable->getParent();
+    auto *NewEndTryTableBB = MF.CreateMachineBasicBlock();
+    MF.insert(MBB->getIterator(), NewEndTryTableBB);
+    auto SplitPos = std::next(EndTryTable->getIterator());
+    NewEndTryTableBB->splice(NewEndTryTableBB->end(), MBB, MBB->begin(),
+                             SplitPos);
+    NewEndTryTableBB->addSuccessor(MBB);
+    BuildMI(NewEndTryTableBB, EndTryTable->getDebugLoc(),
+            TII.get(WebAssembly::UNREACHABLE));
+  }
+}
+
 /// Insert BLOCK/LOOP/TRY/TRY_TABLE markers at appropriate places.
 void WebAssemblyCFGStackify::placeMarkers(MachineFunction &MF) {
   // We allocate one more than the number of blocks in the function to
@@ -2398,13 +2463,17 @@ void WebAssemblyCFGStackify::placeMarkers(MachineFunction &MF) {
     }
   }
 
-  // Fix mismatches in unwind destinations induced by linearizing the code.
   if (MCAI->getExceptionHandlingType() == ExceptionHandling::Wasm &&
       MF.getFunction().hasPersonalityFn()) {
-    bool MismatchFixed = fixCallUnwindMismatches(MF);
-    MismatchFixed |= fixCatchUnwindMismatches(MF);
-    if (MismatchFixed)
-      recalculateScopeTops(MF);
+    const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
+    // Add an 'unreachable' after 'end_try_table's.
+    addUnreachableAfterTryTables(MF, TII);
+    // Fix mismatches in unwind destinations induced by linearizing the code.
+    fixCallUnwindMismatches(MF);
+    fixCatchUnwindMismatches(MF);
+    // addUnreachableAfterTryTables and fixUnwindMismatches create new BBs, so
+    // we need to recalculate ScopeTops.
+    recalculateScopeTops(MF);
   }
 }
 
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td b/llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td
index b68dd8809bb920..ed6aec1ab33e3f 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td
@@ -144,7 +144,7 @@ defm THROW_REF : I<(outs), (ins EXNREF:$exn), (outs), (ins), [],
                    "throw_ref \t$exn", "throw_ref", 0x0a>;
 } // isTerminator = 1, hasCtrlDep = 1, isBarrier = 1
 
-// Region within which an exception is caught: try / end_try
+// Region within which an exception is caught: try_table / end_try_table
 let Uses = [VALUE_STACK], Defs = [VALUE_STACK] in {
 defm TRY_TABLE : I<(outs), (ins Signature:$sig, variable_ops),
                    (outs), (ins Signature:$sig, catch_list:$cal), [],
diff --git a/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll b/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
index 683b03d16d57bd..98de9a267b95a5 100644
--- a/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
+++ b/llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll
@@ -857,6 +857,7 @@ invoke.cont:                                      ; preds = %entry
 ; NOSORT:         loop
 ; NOSORT:           call  foo
 ; NOSORT:         end_loop
+; NOSORT:         unreachable
 ; NOSORT:       end_block                                      # label[[L3]]:
 ; NOSORT:       throw_ref
 ; NOSORT:     end_try_table
diff --git a/llvm/test/CodeGen/WebAssembly/exception-legacy.mir b/llvm/test/CodeGen/WebAssembly/exception-legacy.mir
index d6f734c64acd69..9273ceeadd0e73 100644
--- a/llvm/test/CodeGen/WebAssembly/exception-legacy.mir
+++ b/llvm/test/CodeGen/WebAssembly/exception-legacy.mir
@@ -78,7 +78,8 @@ body: |
     EH_LABEL <mcsymbol .Ltmp2>
     CATCHRET %bb.2, %bb.1, implicit-def dead $arguments
 
-  ; CHECK: bb.2
+  ; This BB should remain (it will be renumbered to bb.1)
+  ; CHECK: bb.1
   bb.2:
   ; predecessors: %bb.0, %bb.1
     RETURN implicit-def dead $arguments
diff --git a/llvm/test/CodeGen/WebAssembly/exception.ll b/llvm/test/CodeGen/WebAssembly/exception.ll
index d6f3ffc8c33cb1..304664b622e800 100644
--- a/llvm/test/CodeGen/WebAssembly/exception.ll
+++ b/llvm/test/CodeGen/WebAssembly/exception.ll
@@ -38,6 +38,7 @@ define void @throw(ptr %p) {
 ; CHECK:       call  foo
 ; CHECK:       br        2
 ; CHECK:     end_try_table
+; CHECK:     unreachable
 ; CHECK:   end_block
 ; CHECK:   local.set  2
 ; CHECK:   local.get  0

@aheejin
Copy link
Member Author

aheejin commented Jan 22, 2025

This problem was not discovered when I was testing the new LLVM implementation with Emscripten test suite last year because binaryen masked this problem. This is invalid but Binaryen accepts it:

block i32
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block

And when emiting out binaries after processing, it adds an unreachable before end_block in this case, making the code valid.

And in Emscripten, wasm-emscripten-finalize used to run in all modes, even in core0, because of BIGINT processing, accidentally masking this problem. But after emscripten-core/emscripten#22993 we don't need to run wasm-emscripten-finalize anymore in core0, which revealed this problem.

It's kind of funny that I spent nontrivial amount of time overhauling the assembly type checker and didn't even run it on the CodeGen test output, because if I had run it on the test output it would've been caught right away...

Comment on lines 1954 to 1963
for (auto &[UnwindDest, _] : UnwindDestToTryRanges) {
auto It = EHPadToTry.find(UnwindDest);
// If UnwindDest is the faker caller block, it will not be in EHPadToTry
// map
if (It != EHPadToTry.end()) {
auto *TryTable = It->second;
auto *EndTryTable = BeginToEnd[TryTable];
splitEndLoopBB(EndTryTable->getParent());
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed because after splitting the BB in addUnreachableAfterTryTables to add an unreachable, end_try_table is not in the EHPad anymore.

  • Before:
bb:
  ... try body ...

ehpad:
  end_try_table
  end_block
  ... catch handler ...
  • After:
bb:
  ... try body ...

end_try_table_bb:
  end_try_table
  unreachable

ehpad:
  end_block
  ... catch handler ...

Line 2198~2206 change in fixCatchUnwindMismatches is the same thing. splitEndLoopBB's argument name changed from UnwindDest to EndTryTableBB for the same reason.

Comment on lines +81 to +82
; This BB should remain (it will be renumbered to bb.1)
; CHECK: bb.1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renumbering happened because now we run recalculateScopeTops every time when EH is used. We used to run it only when unwind mismatches existed. Now we run it every time because adding unreachable after end_try_table requires splitting BBs and CFG changes.

aheejin added a commit to aheejin/emscripten that referenced this pull request Jan 22, 2025
This adds a new `WASM` mode in `ExceptionLibrary`, which uses the new
standard Wasm (exnref) EH, adds a library variant for it, and make the
tests use the new LLVM implementation instead of the Binaryen
translator.

The CI won't pass until llvm/llvm-project#123915
lands.
@dschuff
Copy link
Member

dschuff commented Jan 22, 2025

This problem was not discovered when I was testing the new LLVM implementation with Emscripten test suite last year because binaryen masked this problem. This is invalid but Binaryen accepts it:

block i32
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block

I think we should consider this a bug in Binaryen. When it parses Wasm input, it should reject invalid wasm unless there is some option enabled that relaxes the validation or accepts some input that isn't really wasm (e.g. if using the relooper).

@aheejin aheejin merged commit c3dfd34 into llvm:main Jan 23, 2025
8 checks passed
@aheejin aheejin deleted the end_try_table_fix2 branch January 23, 2025 06:39
aheejin added a commit to emscripten-core/emscripten that referenced this pull request Jan 29, 2025
This adds a new `WASM` mode in `ExceptionLibrary`, which uses the new
standard Wasm (exnref) EH, adds a library variant for it, and make the
tests use the new LLVM implementation instead of the Binaryen
translator.

The CI won't pass until llvm/llvm-project#123915
lands.

This requires adding `-wasmexcept` and `-wasmsjlj` variants to
`MINIMAL_TASKS` in `embuilder.py`
because they are necessary to run coreX tests in CircleCI, because
dylink coreX tests in CircleCI rellies on `./embuilder build MINIMAL_PIC
--pic`, which is the sum of `MINIMAL_TASKS` and `MINIMAL_PIC_TASKS`.

Because a new variant is added to `ExceptionLibrary`, this increases
`SYSTEM` library size from 366M to 400M, roughly by 11%. We discussed
about the possibility of not shipping the exnref libraries and let them
be built on demand, but given that `SYSTEM` include all variants, I'm
not sure how we are going to do that:
https://github.com/emscripten-core/emscripten/blob/73ebb91029948e62b3a4cea9ccc8db2dd87162b5/embuilder.py#L174-L177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants