Skip to content

WebAssembly and Exception Handling (throw) #23442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anutosh491 opened this issue Jan 17, 2025 · 51 comments
Open

WebAssembly and Exception Handling (throw) #23442

anutosh491 opened this issue Jan 17, 2025 · 51 comments

Comments

@anutosh491
Copy link

Hey all,

After getting clang-repl running in the browser, I worked on integration it with jupyterlite. Xeus-cpp, a C++ Jupyter kernel provides a way to integrate it. Here is a static link that can be used to try C++ completely in the browser (https://compiler-research.org/xeus-cpp/lab/index.html) . An example notebook xeus-cpp-lite-demo.ipynb has been provided to show what all can be acheived.

Coming back to the issue. I see we can't run throw (or a try catch blocking involving throw) while running clang-repl in the browser.

Image

The debug logs tell me that this comes from dlopen

Image

All this can be tried through the static link above.

@anutosh491
Copy link
Author

anutosh491 commented Jan 17, 2025

Now the point is that for running clang-repl in the browser. This is the workflow taken

code -> PTU -> LLVM IR -> wasm object -> wasm binary -> loaded on top of main module using dlopen

  1. So as it fails in the dlopen step, we know for sure that the LLVM IR is being produced and also a wasm binary is being produced (hopefully correctly)

Pasting them down below just for reference

i) LLVM IR (only relevant part)

@_ZTIi = external constant ptr
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_2, ptr null }]
define internal void @__stmts__0() #0 {
  %1 = call ptr @__cxa_allocate_exception(i32 4) #2
  store i32 1, ptr %1, align 16
  call void @__cxa_throw(ptr %1, ptr @_ZTIi, ptr null) #3
  unreachable
}
declare ptr @__cxa_allocate_exception(i32) #0
declare void @__cxa_throw(ptr, ptr, ptr) #0
; Function Attrs: noinline
define internal void @_GLOBAL__sub_I_incr_module_2() #1 {
  call void @__stmts__0()
  ret void
}

ii) wasm module

(module $incr_module_2.wasm
  (memory $env.memory (;0;) (import "env" "memory") 0)
  (table $env.__indirect_function_table (;0;) (import "env" "__indirect_function_table") 0 funcref)
  (global $__memory_base (;0;) (import "env" "__memory_base") i32)
  (global $__table_base (;1;) (import "env" "__table_base") i32)
  (func $__cxa_allocate_exception (;0;) (import "env" "__cxa_allocate_exception") (param i32) (result i32))
  (func $__cxa_throw (;1;) (import "env" "__cxa_throw") (param i32 i32 i32))
  (global $typeinfo for int (;2;) (import "GOT.mem" "_ZTIi") (mut i32))
  (func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
    call $_GLOBAL__sub_I_incr_module_2
  )
  (func $__wasm_apply_data_relocs (;3;) (export "__wasm_apply_data_relocs")
  )
  (func $__stmts__0 (;4;)
    (local $var0 i32)
    i32.const 4
    call $__cxa_allocate_exception
    local.tee $var0
    i32.const 1
    i32.store
    local.get $var0
    global.get $typeinfo for int
    i32.const 0
    call $__cxa_throw
    unreachable
  )
  (func $_GLOBAL__sub_I_incr_module_2 (;5;)
    call $__stmts__0
  )
)

I think this looks correct to me !

@anutosh491
Copy link
Author

Now coming back to the dloepn step. The debugger through chrome tools tells me that this is the last part where it ends up

var init = moduleExports['__wasm_call_ctors'];
if (init) {
if (runtimeInitialized) {
init();

Which means it is trying to execute this block I'd guess

  (func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
    call $_GLOBAL__sub_I_incr_module_2
  )
  (func $__stmts__0 (;4;)
    (local $var0 i32)
    i32.const 4
    call $__cxa_allocate_exception
    local.tee $var0
    i32.const 1
    i32.store
    local.get $var0
    global.get $typeinfo for int
    i32.const 0
    call $__cxa_throw
    unreachable
  )
  (func $_GLOBAL__sub_I_incr_module_2 (;5;)
    call $__stmts__0
  )

But it isn't able to. Now __wasm_call_ctors calls _GLOBAL__sub_I_incr_module_5 which simply calls __stmts__0 ... So I am guessing its just not able to run __stmts__0 but I think even that is being framed correctly ?

@anutosh491
Copy link
Author

anutosh491 commented Jan 17, 2025

cc @sbc100 @kripken

Here's what I thought might be going wrong.

  1. Just as a sanity check I thought that I should confirm the presence of symbols in the final xcpp.wasm be built (the wasm binary out of xeus-cpp that acts as a main module)
(xeus-lite-host) anutosh491@Anutoshs-MacBook-Air build % wasm-objdump -x xcpp.wasm | grep __cxa
 - func[1] sig=10 <__cxa_find_matching_catch_2> <- env.__cxa_find_matching_catch_2
.....
 - func[9678] <__cxa_allocate_exception>
.....
 - global[3] <__cxa_throw>

(xeus-lite-host) anutosh491@Anutoshs-MacBook-Air build % wasm-objdump -x xcpp.wasm | grep _ZTIi
 - global[1701] i32 mutable=0 <_ZTIi> - init i32=409276
 - global[1701] -> "_ZTIi"

I think we have everything

  1. I thought this might be a -fwasm-exceptions or -fexceptions thingy. I realized we build xeus-cpp with -fexceptions but llvm isn't using that (we obviously need to build llvm for wasm to get libclangInterpreter.a which facilitates using clang-repl in the web). So I tried this too but didn't help me in any way. Still get the same result.

If y'all are interested in the configuration, this is what i used.

emcmake cmake -DCMAKE_BUILD_TYPE=MinSizeRel         \
    -DBUILD_SHARED_LIBS=OFF                         \
    -DLLVM_HOST_TRIPLE=wasm32-unknown-emscripten    \
    -DLLVM_TARGETS_TO_BUILD="WebAssembly"           \
    -DLLVM_INCLUDE_BENCHMARKS=OFF                   \
    -DLLVM_INCLUDE_EXAMPLES=OFF                     \
    -DLLVM_INCLUDE_TESTS=OFF                        \
    -DLLVM_ENABLE_LIBEDIT=OFF                       \
    -DLLVM_ENABLE_PROJECTS="clang;lld"              \
    -DLLVM_ENABLE_THREADS=OFF                       \
    -DCLANG_ENABLE_STATIC_ANALYZER=OFF              \
    -DCLANG_ENABLE_ARCMT=OFF                        \
    -DCLANG_ENABLE_BOOTSTRAP=OFF                    \
    -DLLVM_ENABLE_ZSTD=OFF                          \
    -DLLVM_ENABLE_LIBXML2=OFF                       \
    -DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4 -fexceptions" \
    ../llvm

Apart from adding the -fexceptions flag here ... everything is what we already use for getting the static link to work !

@sbc100
Copy link
Collaborator

sbc100 commented Jan 17, 2025

Is there still an issue here?

You are correct that you need to make sure that -fwasm-exceptions is either used everywhere, or nowhere. You cannot mix code compiled with and without that flag. Did making that consistent fix your issue?

From your original backtrace looks like the code its trying to load a DLL called "const char*", which is very odd. Can you stop in the debugger and see why that might be? Is the name of file being loaded really "const char*"?

(BTW, you file bugs like this it would be very helpful if you could copy and paste the text rather than attaching screenshots. Using test make it much easier for use to search / copy / etc within the issue.)

@anutosh491
Copy link
Author

anutosh491 commented Jan 20, 2025

Is there still an issue here?

Yes it is.

Did making that consistent fix your issue?

So I think we tried building the whole stack with -fexceptions (jupyterlite, xeus-cpp, llvm etc) and we haven't moved to -fwasm-exceptions yet as we thought using one of these for the whole toolchain would be enough !

Can you stop in the debugger and see why that might be? Is the name of file being loaded really "const char*"?

Is it ? So when we use clang-repl in the browser ever code block produced a file named incr_module_xx.wasm where xx is the code block number. So yeah don't think that's the file name here !

I think it, it might just be the exception ptr type or something (not sure). The below issue looks relevant here.
#6330

EDIT: Also just questioning my breakdown here. The wasm module generated looks correct to me and I think it is the init() call that I referred above that doesn't work ! Maybe someone could confirm that for me ?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 20, 2025

I think that fact that _dlopen_js is being called with the string "const char*" rather than the name of a DLL is really the clue. That looks really wrong.

Can you break at that callsite and see the string ptr value being passed to _dlopen_js? Presumably the user code passed a completely different string.. can you print the ptr value on C++ side too? It looks like dlopen is being called from side module with function names like $func917. I imagine somehow the DLL is confused about where its static data lives? Perhaps __memory_base was not correctly set when the DLL was loaded?

Can you try building you side modules with --profiling-funcs so you get useful functions names instead of $func917?

@anutosh491
Copy link
Author

Hey @sbc100 sorry took some time to get back

But this the whole log when we try executing "throw 1;"

Image

It points to the addModule function as expected where the dlopen is being called

https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Wasm.cpp#L65

@anutosh491
Copy link
Author

So this is what I see when I build xeus-cpp with Assertion=0 vs Assertions=1

  1. Assertions=0 (looks like some exception ptr)
    Image

  2. Assertions=1 (type being returned)

Image

@anutosh491
Copy link
Author

anutosh491 commented Feb 5, 2025

Also I don't see absolutely no difference in how dlopen is working for any cell that works vs the cell executing throw 1;
Its the same. The file name is also tmp/incr_module_2.wasm which should be the case. Case 1 is when some default case vs case 2 is with throw

Image

Image

At this point there is so much we can already do (check the example notebook https://github.com/compiler-research/xeus-cpp/blob/main/notebooks/xeus-cpp-lite-demo.ipynb) that not being able to use throw 1; seems very weird.

@anutosh491
Copy link
Author

I executed stuff till this the final failure which comes up here

var init = moduleExports['__wasm_call_ctors'];
if (init) {
if (runtimeInitialized) {
init();

As soon as the debugger hits init() I get the error message.

Nothing really seems fishy till the end. I see memory_base is 0 here (not sure if that shouldn't be the case, looks ok to me)

Image

@sbc100
Copy link
Collaborator

sbc100 commented Feb 5, 2025

What is memorySize? memoryBase should only be zero if memorySize is also zero.

The memoryBase and tableBase are there the data segment and table segment for your DLL are stored. They will only be zero when you module has no memory segment or no table segment of its own.

You can see how much data and table space your module needs by looking at the dylink section of the module/DLL. Its always the first section in any wasm DLL:

$ ./emcc -sSIDE_MODULE test/hello_world.c
$ wasm-objdump -x a.out.wasm 

a.out.wasm:	file format wasm 0x1

Section Details:

Custom:
 - name: "dylink.0"
 - mem_size     : 15
 - mem_p2align  : 0
 - table_size   : 0
 - table_p2align: 0
...

Here you can see the hello world program, when compiled to DLL requires 15 bytes of memory and zero table slots.

@anutosh491
Copy link
Author

anutosh491 commented Feb 5, 2025

This is what metadata has after getDylinkMetadata(binary)

var loadWebAssemblyModule = (binary, flags, libName, localScope, handle) => {
var metadata = getDylinkMetadata(binary);
memoryAlign: 0
memorySize: 0
neededDynlibs: []
tableAlign: 0
tableSize: 0
tlsExports: Set(0) {size: 0}
weakImports: Set(0) {size: 0}
[[Prototype]]: Object

Binary shows Int8Array(809) if that's relevant. So yeah I guess it is the getDylinkMetadata call that doesn't go as expected ?

Also I see you've mentioned about using wasm-objdump but I am not sure how to put it to use at runtime. As in every cell block gives me a side module (code -> PTU -> llvm IR -> incr_module_xx.so file -> incr_module_xx.wasm file -> loaded on top of main module using dlopen)

So the max I can do is go through the incr_module_xx.wasm file which comes out of to be something like this

(module $incr_module_2.wasm
  (memory $env.memory (;0;) (import "env" "memory") 0)
  (table $env.__indirect_function_table (;0;) (import "env" "__indirect_function_table") 0 funcref)
  (global $__stack_pointer (;0;) (import "env" "__stack_pointer") (mut i32))
  (global $__memory_base (;1;) (import "env" "__memory_base") i32)
  (global $__table_base (;2;) (import "env" "__table_base") i32)
  (func $__cxa_allocate_exception (;0;) (import "env" "__cxa_allocate_exception") (param i32) (result i32))
  (func $__cxa_throw (;1;) (import "env" "__cxa_throw") (param i32 i32 i32))
  (global $typeinfo for int (;3;) (import "GOT.mem" "_ZTIi") (mut i32))
  (global $__dso_handle (;4;) (export "__dso_handle") i32 (i32.const 0))
  (func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
    call $_GLOBAL__sub_I_incr_module_2
  )
  (func $__stmts__0 (;3;)
    (local $var0 i32)
    i32.const 4
    call $__cxa_allocate_exception
    local.tee $var0
    i32.const 1
    i32.store
    local.get $var0
    global.get $typeinfo for int
    i32.const 0
    call $__cxa_throw
    unreachable
  )
  (func $_GLOBAL__sub_I_incr_module_2 (;4;)
    call $__stmts__0
  )
)

P.S: Also just for you to confirm for yourself that the error is coming out of dlopen itself (and also to maybe play around and debug any questions you might have) I think you can try running throw 1; through our static link and add breakpoints in the source files to see what's happening https://compiler-research.org/xeus-cpp/lab/index.html

@sbc100
Copy link
Collaborator

sbc100 commented Feb 5, 2025

So incr_module_xx.wasm module looks like it actually doesn't have any data or table slots (I don't see any data segments or elem segments).

BTW you can run wasm-object on your incr_module_xx.wasm file to see the dylink section if you like. It not showing up in the wat disassembly that you attached above.

Can you set a breakpoint on the "Error loading dyanmic library" line and see inspect the exception (e) that is being thrown? What does the stack trace for that exception look like?

@anutosh491
Copy link
Author

anutosh491 commented Feb 6, 2025

So incr_module_xx.wasm module looks like it actually doesn't have any data or table slots (I don't see any data segments or elem segments).

This is what I see for int x = 10;

Image

This seems to have a data segment at the end. So I don't think this is the case for every cell. It's just throw which might be at fault.

@anutosh491
Copy link
Author

anutosh491 commented Feb 6, 2025

Okay so I set the debugger below and print e

            try {
                return loadDynamicLibrary(filename, combinedFlags, localScope, handle)
            } catch (e) {
                err(`Error in loading dynamic library ${filename}: ${e}`);  //HERE
                dlSetError(`Could not load dynamic lib: ${filename}\n${e}`);
                return 0
            }

I see this

Image

e
int
    at ___cxa_throw (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1250810)
    at incr_module_3.wasm.__stmts__0 (wasm://wasm/incr_module_3.wasm-35041342:wasm-function[3]:0x139)
    at incr_module_3.wasm._GLOBAL__sub_I_incr_module_3 (wasm://wasm/incr_module_3.wasm-35041342:wasm-function[4]:0x143)
    at incr_module_3.wasm.__wasm_call_ctors (wasm://wasm/incr_module_3.wasm-35041342:wasm-function[2]:0x119)
    at postInstantiation (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1241584)
    at loadModule (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1242062)
    at loadWebAssemblyModule (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1242363)
    at getExports (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1245234)
    at loadDynamicLibrary (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1245560)
    at dlopenInternal (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1337847)

I think I have pointed this out before that eventually we end up at init() or wasm_call_ctors() and eventually it gets to
call $__cxa_throw from the whole wasm file (pasted above in #23442 (comment))

Well what does that mean ? I guess the logic is correct as we would have liked it to be ? But then ..... throw is called and hence the dlopen step errors out I am guessing ! Confused as to what needs to be done here. Does this mean we can't load a DLL/module having a call to __cxa_throw using dlopen ?

For some context, xeus-cpp unlike xeus-cpp-lite uses clang-repl locally and not in the browser. So If you check out example notebook of what xeus-cpp can do locally, clang-repl can execute throw

Image

@sbc100
Copy link
Collaborator

sbc100 commented Feb 6, 2025

So one of your static constructors is throwing an exception.

Are you actually trying to execute a C++ throw in your notebook? If so, wouldn't you expect the DLL to fail to load? Or is the problem that you want to somehow catch that exception yourself instead of having dlopen fail?

@anutosh491
Copy link
Author

Hey @sbc100

Sorry for missing out on this but yeah the point is I should be able to exactly replicate what clang-repl does locally (or what xeus-cpp is doing here based on clang-repl)

Image

So my point is that I won't expect an error out of dlopen or the module failing to load. Rather the module should be loaded on top of the main module and that should then give back any Error message or whatever we print through the console .

Does this mean the wasm being generated is wrong ? Cause the wasm binary calls _cxa_throw directly I suppose. Don't you think in this case we should be able to go exactly how clang-repl handles this ? I am a bit confused on how to proceed !

@sbc100
Copy link
Collaborator

sbc100 commented Feb 13, 2025

Are you saying that in this case the catch is not actually catching the throw 1? i.e. you are seeing the town value escape an not seeing the Error print message?

@anutosh491
Copy link
Author

Okay @sbc100

I think this might be a new/separate problem at hand. But let's look into this

Case 1: We have throw 1; and this fails through dlopen failing to load as init calls __cxa_throw. And maybe there is nothing wrong here. Obviously I wouldn't like the kernel crashing for xeus-cpp-lite but fair enough as an exception was not caught hence now the jupyterlite instance is corrupted.

Case 2: But for a throw catch block

try{
    throw 1;
} catch (...) {
    0;
}

I still see the same happening

Image

i) Now even in this case we obviously first parse and come up with a LLVM IR. Looking at the LLVM IR generated ..... I do see __cxa_begin_catch being referenced .... but not sure if it put to use

; ModuleID = 'incr_module_3'
source_filename = "incr_module_3"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-i128:128-f128:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-emscripten"
@_ZTIi = external constant ptr
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_3, ptr null }]
define internal void @__stmts__0() #0 personality ptr @__gxx_personality_v0 {
entry:
  %exn.slot = alloca ptr, align 4
  %ehselector.slot = alloca i32, align 4
  %exception = call ptr @__cxa_allocate_exception(i32 4) #2
  store i32 1, ptr %exception, align 16
  call void @__cxa_throw(ptr %exception, ptr @_ZTIi, ptr null) #3
  br label %unreachable
unreachable:                                      ; preds = %entry
  unreachable
}
declare ptr @__cxa_allocate_exception(i32) #0
declare void @__cxa_throw(ptr, ptr, ptr) #0
declare i32 @__gxx_personality_v0(...) #0
declare ptr @__cxa_begin_catch(ptr) #0
declare void @__cxa_end_catch() #0
; Function Attrs: noinline
define internal void @_GLOBAL__sub_I_incr_module_3() #1 {
entry:
  call void @__stmts__0()
  ret void
}
attributes #0 = { "target-features"="-atomics,+bulk-memory,+bulk-memory-opt,+call-indirect-overlong,-exception-handling,-extended-const,-fp16,-multimemory,+multivalue,+mutable-globals,+nontrapping-fptoint,+reference-types,-relaxed-simd,+sign-ext,-simd128,-tail-call,-wide-arithmetic," }
attributes #1 = { noinline "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-atomics,+bulk-memory,+bulk-memory-opt,+call-indirect-overlong,-exception-handling,-extended-const,-fp16,-multimemory,+multivalue,+mutable-globals,+nontrapping-fptoint,+reference-types,-relaxed-simd,+sign-ext,-simd128,-tail-call,-wide-arithmetic," }
attributes #2 = { nounwind }
attributes #3 = { noreturn }
!llvm.linker.options = !{}
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7, !8}
!llvm.ident = !{!9}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"wasm-feature-bulk-memory", i32 43}
!2 = !{i32 1, !"wasm-feature-bulk-memory-opt", i32 43}
!3 = !{i32 1, !"wasm-feature-call-indirect-overlong", i32 43}
!4 = !{i32 1, !"wasm-feature-multivalue", i32 43}
!5 = !{i32 1, !"wasm-feature-mutable-globals", i32 43}
!6 = !{i32 1, !"wasm-feature-nontrapping-fptoint", i32 43}
!7 = !{i32 1, !"wasm-feature-reference-types", i32 43}
!8 = !{i32 1, !"wasm-feature-sign-ext", i32 43}

ii) After this step we end up generating the wasm module which is obviously wrong

(module $incr_module_3.wasm
  (memory $env.memory (;0;) (import "env" "memory") 0)
  (table $env.__indirect_function_table (;0;) (import "env" "__indirect_function_table") 0 funcref)
  (global $__stack_pointer (;0;) (import "env" "__stack_pointer") (mut i32))
  (global $__memory_base (;1;) (import "env" "__memory_base") i32)
  (global $__table_base (;2;) (import "env" "__table_base") i32)
  (func $__cxa_allocate_exception (;0;) (import "env" "__cxa_allocate_exception") (param i32) (result i32))
  (func $__cxa_throw (;1;) (import "env" "__cxa_throw") (param i32 i32 i32))
  (global $typeinfo for int (;3;) (import "GOT.mem" "_ZTIi") (mut i32))
  (func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
    call $_GLOBAL__sub_I_incr_module_3
  )
  (func $__wasm_apply_data_relocs (;3;) (export "__wasm_apply_data_relocs")
  )
  (func $__stmts__0 (;4;)
    (local $var0 i32)
    global.get $__stack_pointer
    i32.const 16
    i32.sub
    global.set $__stack_pointer
    i32.const 4
    call $__cxa_allocate_exception
    local.tee $var0
    i32.const 1
    i32.store
    local.get $var0
    global.get $typeinfo for int
    i32.const 0
    call $__cxa_throw
    unreachable
  )
  (func $_GLOBAL__sub_I_incr_module_3 (;5;)
    call $__stmts__0
  )
)

@anutosh491
Copy link
Author

My understanding related to a catch block is that we definitely should end up seeing a landingpad

Now my point is how is llvm IR generated through clang and clang-repl turning out to be different ?

For example if I put this in test.cpp and run the following

int main() {
    try {
        throw 1;
    } catch (...) {
        0;
    }
}
  1. emcc test.cpp -std=c++20 -fexceptions -emit-llvm -S -o test.ll ..... I see this
; ModuleID = 'test.cpp'
source_filename = "test.cpp"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-f128:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-emscripten"

@_ZTIi = external constant ptr

@__main_void = hidden alias i32 (), ptr @main

; Function Attrs: mustprogress noinline norecurse optnone
define hidden noundef i32 @main() #0 personality ptr @__gxx_personality_v0 {
  %1 = alloca ptr, align 4
  %2 = alloca i32, align 4
  %3 = call ptr @__cxa_allocate_exception(i32 4) #1
  store i32 1, ptr %3, align 16
  invoke void @__cxa_throw(ptr %3, ptr @_ZTIi, ptr null) #2
          to label %12 unwind label %4

4:                                                ; preds = %0
  %5 = landingpad { ptr, i32 }
          catch ptr null
  %6 = extractvalue { ptr, i32 } %5, 0
  store ptr %6, ptr %1, align 4
  %7 = extractvalue { ptr, i32 } %5, 1
  store i32 %7, ptr %2, align 4
  br label %8

8:                                                ; preds = %4
  %9 = load ptr, ptr %1, align 4
  %10 = call ptr @__cxa_begin_catch(ptr %9) #1
  call void @__cxa_end_catch()
  br label %11

11:                                               ; preds = %8
  ret i32 0

12:                                               ; preds = %0
  unreachable
}

declare ptr @__cxa_allocate_exception(i32)

declare void @__cxa_throw(ptr, ptr, ptr)

declare i32 @__gxx_personality_v0(...)

declare ptr @__cxa_begin_catch(ptr)

declare void @__cxa_end_catch()

attributes #0 = { mustprogress noinline norecurse optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+multivalue,+mutable-globals,+reference-types,+sign-ext,-bulk-memory,-nontrapping-fptoint" }
attributes #1 = { nounwind }
attributes #2 = { noreturn }

!llvm.linker.options = !{}
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 20.0.0git (https:/github.com/llvm/llvm-project 1d810ece2b2c8fab77720493864257f0ea3336a9)"}
  1. Followed by this we can generate a SIDE_MODULE as required.

emcc test.cpp -std=c++20 -sSIDE_MODULE=1 -fexceptions -sDISABLE_EXCEPTION_CATCHING=0 -o test.wasm

I see most of the important stuff being put to use

  (import "env" "__cxa_allocate_exception" (func (;0;) (type 0)))
  (import "env" "invoke_viii" (func (;1;) (type 3)))
  (import "env" "__cxa_find_matching_catch_3" (func (;2;) (type 0)))
  (import "env" "getTempRet0" (func (;3;) (type 1)))
  (import "env" "__cxa_begin_catch" (func (;4;) (type 0)))
  (import "env" "__cxa_end_catch" (func (;5;) (type 2)))
  (import "env" "__stack_pointer" (global (;0;) (mut i32)))
  (import "env" "__memory_base" (global (;1;) i32))
  (import "env" "__table_base" (global (;2;) i32))
  (import "GOT.mem" "__THREW__" (global (;3;) (mut i32)))
  (import "GOT.mem" "_ZTIi" (global (;4;) (mut i32)))
  (import "GOT.func" "__cxa_throw" (global (;5;) (mut i32)))
  (import "env" "memory" (memory (;0;) 0))
  (import "env" "__indirect_function_table" (table (;0;) 0 funcref))

@anutosh491
Copy link
Author

anutosh491 commented Feb 15, 2025

This is weird as running clang-repl in the browser or locally, the PTU generation step is the same, it's only the execution step that differs

https://github.com/llvm/llvm-project/blob/66465c3b0ab1b32403ad5a1c3114174d87830f54/clang/lib/Interpreter/Interpreter.cpp#L646-L650

So technically we shouldn't be seeing a wrong LLVM IR leading to a wrong wasm module !

EDIT: I have a question.

This is weird as running clang-repl in the browser or locally, the PTU generation step is the same, it's only the execution step that differs

Is this possibly dependent on how we build LLVM (maybe with some sort of exceptions enabled or disabled). This is what I use to build llvm for wasm currently.

mkdir build
cd build

export CMAKE_PREFIX_PATH=$PREFIX
export CMAKE_SYSTEM_PREFIX_PATH=$PREFIX

# clear LDFLAGS flags because they contain sWASM_BIGINT
export LDFLAGS=""

# Configure step
emcmake cmake ${CMAKE_ARGS} -S ../llvm -B .         \
    -DCMAKE_BUILD_TYPE=MinSizeRel                   \
    -DCMAKE_PREFIX_PATH=$PREFIX                     \
    -DCMAKE_INSTALL_PREFIX=$PREFIX                  \
    -DLLVM_HOST_TRIPLE=wasm32-unknown-emscripten    \
    -DLLVM_TARGETS_TO_BUILD="WebAssembly"           \
    -DLLVM_ENABLE_ASSERTIONS=ON                     \
    -DLLVM_INCLUDE_BENCHMARKS=OFF                   \
    -DLLVM_INCLUDE_EXAMPLES=OFF                     \
    -DLLVM_INCLUDE_TESTS=OFF                        \
    -DLLVM_ENABLE_LIBEDIT=OFF                       \
    -DLLVM_ENABLE_PROJECTS="clang;lld"              \
    -DLLVM_ENABLE_THREADS=OFF                       \
    -DLLVM_ENABLE_ZSTD=OFF                          \
    -DLLVM_ENABLE_LIBXML2=OFF                       \
    -DCLANG_ENABLE_STATIC_ANALYZER=OFF              \
    -DCLANG_ENABLE_ARCMT=OFF                        \
    -DCLANG_ENABLE_BOOTSTRAP=OFF                    \
    -DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4"

# Build step
emmake make -j4

# Install step
emmake make install

# Copy all files with ".wasm" extension to $PREFIX/bin
cp $SRC_DIR/build/bin/*.wasm $PREFIX/bin

All of this is present as a part of the recipe for llvm on emscripten-forge (https://github.com/emscripten-forge/recipes/blob/main/recipes/recipes_emscripten/llvm/build.sh)

@anutosh491
Copy link
Author

anutosh491 commented Feb 19, 2025

Hey @sbc100

Not sure you saw my ping above, hence tagging you to maybe help me out with the last 2-3 messages continuing our discussion after #23442 (comment)

@sbc100
Copy link
Collaborator

sbc100 commented Feb 19, 2025

What are the build flags you are using then building the side module in clang-repl? They must be somehow different from those used in emscripten. I'm guessing you are missing -fexceptions or -fwasm-exceptions perhaps?

You can add -v to the emcc command to see all the flags that get passed to clang and wasm-ld, in case that helps.

@anutosh491
Copy link
Author

anutosh491 commented Feb 19, 2025

What are the build flags you are using then building the side module in clang-repl?

So these are the flags used for the side module (each cell gives us one that is loaded on top of the main module)

https://github.com/llvm/llvm-project/blob/85601fd78f4cbf0ce5df74c5926183035f859572/clang/lib/Interpreter/Wasm.cpp#L74-L84

They must be somehow different from those used in emscripten. I'm guessing you are missing -fexceptions or -fwasm-exceptions perhaps?

Wait, so for the latest build. I took care of this (basically just updated the cxx_flags to take care of -fexceptions too so just added it to the emcmake cmake... command here

And I still see this

Image

So the current CXX_FLAGS being put to use are these

CXX_FLAGS = -Dwait4=__syscall_wait4 -fexceptions -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -Os -DNDEBUG -std=c++17 -UNDEBUG

So yeah we know the roadmap here (code -> PTU -> llvm IR -> incr_module_xx.so file -> incr_module_xx.wasm file -> loaded on top of main module using dlopen)

My first concern is clang-repl and clang technically promise making use of the same llvm IR. I don't know why we don't see the correct LLVM IR (even before getting to the shared object in clang-repl) for this case.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 19, 2025

What are the build flags you are using then building the side module in clang-repl?

So these are the flags used for the side module (each cell gives us one that is loaded on top of the main module)

https://github.com/llvm/llvm-project/blob/85601fd78f4cbf0ce5df74c5926183035f859572/clang/lib/Interpreter/Wasm.cpp#L74-L84

Those are the link flags. What are the compile-time flags used to build the object file being linked?

@anutosh491
Copy link
Author

anutosh491 commented Feb 20, 2025

Hey @sbc100, yes I think stuff boils down to that

What are the compile-time flags used to build the object file being linked?

But not sure how to get hold of them :\

But everything happens in the addModule code

What I think happens is

  1. once we have the LLVM IR, we use this framework to create the shared object (which is later moved to a wasm binary using wasm-ld)
  const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
      PTU.TheModule->getTargetTriple(), ErrorString);
  if (!Target) {
    return llvm::make_error<llvm::StringError>("Failed to create Wasm Target: ",
                                               llvm::inconvertibleErrorCode());
  }

  llvm::TargetOptions TO = llvm::TargetOptions();
  llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
      PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
  PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());
  std::string ObjectFileName = PTU.TheModule->getName().str() + ".o";
  std::string BinaryFileName = PTU.TheModule->getName().str() + ".wasm";

  std::error_code Error;
  llvm::raw_fd_ostream ObjectFileOutput(llvm::StringRef(ObjectFileName), Error);

  llvm::legacy::PassManager PM;
  if (TargetMachine->addPassesToEmitFile(PM, ObjectFileOutput, nullptr,
                                         llvm::CodeGenFileType::ObjectFile)) {
    return llvm::make_error<llvm::StringError>(
        "Wasm backend cannot produce object.", llvm::inconvertibleErrorCode());
  }

  if (!PM.run(*PTU.TheModule)) {

    return llvm::make_error<llvm::StringError>("Failed to emit Wasm object.",
                                               llvm::inconvertibleErrorCode());
  }
  1. My understanding here is that

i) We create a Target (extracted from the target triple from our module)
ii) We create a TargetMachine which I guess uses llc on the llvm IR we have.
iii) Now by default we don't really set any TargetOptions .... but that being said we can configure this to use wasm exceptions using inspired from https://github.com/llvm/llvm-project/blob/12f8ed58a039ff3a3365591203f76ae07a179215/llvm/include/llvm/MC/MCTargetOptions.h#L25

  llvm::TargetOptions TO = llvm::TargetOptions();
  TO.ExceptionModel = llvm::ExceptionHandling::Wasm;

iv) But that being said, I know of an error like this

LLVM ERROR: -exception-model=wasm only allowed with at least one of -wasm-enable-eh or -wasm-enable-sjlj

v) So although TO.ExceptionModel = llvm::ExceptionHandling::Wasm; would take care of -exception-model=wasm I am not sure how ..... but we need to pass -wasm-enable-eh and/or -mattr=+exception-handling ..... so that we possibly end up with something like this I suppose

; RUN: llc < %s ....... -wasm-enable-eh  -exception-model=wasm -mattr=+exception-handling,bulk-memory

I am not sure how but I see the 3rd parameter of createTargetMachine allows us to pass some features
https://github.com/llvm/llvm-project/blob/12f8ed58a039ff3a3365591203f76ae07a179215/llvm/include/llvm/MC/TargetRegistry.h#L456

So we currently pass nothing here. So maybe we can update the code in AddModule to have this

  llvm::TargetOptions TO = llvm::TargetOptions();
  TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
  llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
      PTU.TheModule->getTargetTriple(), "", "+wasm-enable-eh,+mattr=+exception-handling", TO, llvm::Reloc::Model::PIC_,
      std::nullopt, llvm::CodeGenOptLevel::None, false);

Not sure it is this way you make use of these flags. I just know we can use -mllvm -wasm-enable-eh and -mexception-handling with emcc. Let me know if the above way is the correct way to put these to use.

Does it make sense to see it this way ?

I am guessing only the TargetMachine and the TargetOptions can play a role here. Cause after that its just the call to addPassesToEmitFile which I guess takes care of all the wasm related passes there are based on the optimization we use.

@anutosh491
Copy link
Author

Apart from this I am not sure if how we build llvm or (the cxx_flags we pass there does make a difference)

I have been using this .... to build libclangInterpreter.a which is the only thing that I need to get clang-repl running in the browser.

# Configure step
emcmake cmake -S ../llvm -B .                       \
    -DCMAKE_BUILD_TYPE=MinSizeRel                   \
    -DCMAKE_PREFIX_PATH=$PREFIX                     \
    -DLLVM_HOST_TRIPLE=wasm32-unknown-emscripten    \
    -DLLVM_TARGETS_TO_BUILD="WebAssembly"           \
    -DLLVM_ENABLE_ASSERTIONS=ON                     \
    -DLLVM_INCLUDE_BENCHMARKS=OFF                   \
    -DLLVM_INCLUDE_EXAMPLES=OFF                     \
    -DLLVM_INCLUDE_TESTS=OFF                        \
    -DLLVM_ENABLE_LIBEDIT=OFF                       \
    -DLLVM_ENABLE_PROJECTS="clang;lld"              \
    -DLLVM_ENABLE_THREADS=OFF                       \
    -DLLVM_ENABLE_ZSTD=OFF                          \
    -DLLVM_ENABLE_LIBXML2=OFF                       \
    -DCLANG_ENABLE_STATIC_ANALYZER=OFF              \
    -DCLANG_ENABLE_ARCMT=OFF                        \
    -DCLANG_ENABLE_BOOTSTRAP=OFF                    \
    -DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4 -fexceptions -mexception-handling"

P.S: not sure passing -mexception-handling here makes sense.

Let me know if you think some changes need to be introduced here. Apart from this yeah, I need some help to look into the llvm IR going to the shared object step.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 20, 2025

The issue here (IIUC) is not how you build llvm, but how llvm is building the side module.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 20, 2025

#23442 (comment)

I am not sure how but I see the 3rd parameter of createTargetMachine allows us to pass some features https://github.com/llvm/llvm-project/blob/12f8ed58a039ff3a3365591203f76ae07a179215/llvm/include/llvm/MC/TargetRegistry.h#L456

So we currently pass nothing here. So maybe we can update the code in AddModule to have this

  llvm::TargetOptions TO = llvm::TargetOptions();
  TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
  llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
      PTU.TheModule->getTargetTriple(), "", "+wasm-enable-eh,+mattr=+exception-handling", TO, llvm::Reloc::Model::PIC_,
      std::nullopt, llvm::CodeGenOptLevel::None, false);

Not sure it is this way you make use of these flags. I just know we can use -mllvm -wasm-enable-eh and -mexception-handling with emcc. Let me know if the above way is the correct way to put these to use.

Does it make sense to see it this way ?

I'm not sure how you are supposed to inject flags into the AddModule code, but sounds like you are on the right track, yes. You need to object file (module) to be built with exception handling support if you want to be able to catch exeptions.

@anutosh491
Copy link
Author

anutosh491 commented Feb 21, 2025

I think @aheejin can help us out here cause I see some of his commits relevant to work on WasmEnableEH for llvm

Hey @aheejin we would appreciate some help here. The following is what we are trying to do.

  1. We are first trying to come up with a Target and a TargetMachine based on the target triple which is wasm32-unknown-emscripten . So basically we want a WebAssembly-specific subclass of TargetMachine.
  const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
      PTU.TheModule->getTargetTriple(), ErrorString);
  if (!Target) {
    return llvm::make_error<llvm::StringError>("Failed to create Wasm Target: ",
                                               llvm::inconvertibleErrorCode());
  }
  1. We then want to enable exception handling support to catch exceptions. Hence we setup the TargetOptions and set the ExceptionModel to llvm::ExceptionHandling::Wasm
   llvm::TargetOptions TO = llvm::TargetOptions();
  TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
  llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
      PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
  PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());
  1. Now our concern is this https://github.com/llvm/llvm-project/blob/cc675c635bf0016111050531e75f8082d0ea120b/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L445-L449

If you see we have this check which we want to go across

  if ((!WasmEnableEH && !WasmEnableSjLj) &&
      TM->Options.ExceptionModel == ExceptionHandling::Wasm)
    report_fatal_error(
        "-exception-model=wasm only allowed with at least one of "
        "-wasm-enable-eh or -wasm-enable-sjlj");

Hence in this case, I want to use WasmEnableEH which I think is directly linked with wasm-enable-eh

But the point is I don't realize how do we pass this flag or make use of it through our TargetMachine ?
This is the code we're interested in

  const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
      PTU.TheModule->getTargetTriple(), ErrorString);
  if (!Target) {
    return llvm::make_error<llvm::StringError>("Failed to create Wasm Target: ",
                                               llvm::inconvertibleErrorCode());
  }

  llvm::TargetOptions TO = llvm::TargetOptions();
  TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
  llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
      PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
  PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());

And technically we aren't sure as to how/where we inject the wasm-enable-eh flag through our TargetMachine. Could you let us know how can the above be updated to take care of this ?

Possibly we would also like to pass -mattr=+exception-handling I suppose so that we end up with -wasm-enable-eh -exception-model=wasm -mattr=+exception-handling

@aheejin
Copy link
Member

aheejin commented Feb 22, 2025

I'm not sure which EH method are you trying to use. Your first code was apparently using Emscripten EH (which was using invoke_*** functions) and why are you trying to switch to Wasm EH? -exception-model=wasm and -wasm-enable-eh are only need for Wasm EH.

And regardless of the EH method you are trying to use, I think you are looking at the wrong part of the code. I'm not familiar with how clang-repl generates code, but the code you pasted looks largely from this function and callees from this function: https://github.com/llvm/llvm-project/blob/5f8da7e7738f043dbde447e48622e9b2afb5ba92/clang/lib/CodeGen/BackendUtil.cpp#L1250-L1274
This is where Clang hands the control over to the LLVM middle-end and the backend.

But this backend pipeline invocation happens after the initial code, namely invokes and landingpads, are generated. And in your case it looks invokes and landingpads are not generated in Clang in the first place. I'm not sure how to pass arguments to clang-repl frontend, but I think you should figure that out first.

@anutosh491
Copy link
Author

anutosh491 commented Feb 24, 2025

Hey @aheejin

Sorry for not getting back earlier. Maybe let me give you some context and tell you where I stand right now so that you can help me out

  • So I maintain xeus-cpp that is a Jupyter Kernel for running C++ in the form of a REPL. Now when integrated with Jupyterlite you can do C++ completely in the browser in a REPL format ( after compiling llvm to webassembly)

Check this link out and maybe play around to see how it works https://compiler-research.org/xeus-cpp/lab/index.html

  • Now how this works is the following. Xeus-cpp ends up giving us the main module which is xcpp.wasm and every code block/cell gives us a side modules loaded on top of this main module at runtime. So we have
    code -> PTU -> LLVM IR -> wasm object -> wasm binary -> loaded on top of main module using dlopen. Hence we can achieve quite some things already https://github.com/compiler-research/xeus-cpp/blob/main/notebooks/xeus-cpp-lite-demo.ipynb

  • That being said exception handling is something we're struggling with.

  • So you can see the cc1 command (if you inspect and look at the console). It goes like something like this

-cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -main-file-name "<<< inputs >>>" -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/ -v -fcoverage-compilation-dir=/ -resource-dir /lib/clang/19 -internal-isystem /include/wasm32-emscripten/c++/v1 -internal-isystem /include/c++/v1 -internal-isystem /lib/clang/19/include -internal-isystem /include/wasm32-emscripten -internal-isystem /include -std=c++20 -fdeprecated-macro -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fincremental-extensions -o "<<< inputs >>>.o" -x c++ "<<< inputs >>>"
clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target wasm32-unknown-emscripten

As can be seen we intend to use simple c++ exceptions through (-fcxx-exceptions -fexceptions). The flag here can be controlled through something like this (https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Interpreter.cpp#L201-L203)

  • Now the point is that if we let's say try executing
int main() {
    try {
        throw 1;
    } catch (...) {
        0;
    }
}

The following is the LLVM IR generated

Generated LLVM IR:
; ModuleID = 'incr_module_2'
source_filename = "incr_module_2"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-f128:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-emscripten"
@_ZTIi = external constant ptr
@__main_void = hidden alias i32 (), ptr @main
; Function Attrs: mustprogress noinline norecurse optnone
define noundef i32 @main() #0 personality ptr @__gxx_personality_v0 {
entry:
  %exn.slot = alloca ptr, align 4
  %ehselector.slot = alloca i32, align 4
  %exception = call ptr @__cxa_allocate_exception(i32 4) #1
  store i32 1, ptr %exception, align 16
  invoke void @__cxa_throw(ptr %exception, ptr @_ZTIi, ptr null) #2
          to label %unreachable unwind label %lpad
lpad:                                             ; preds = %entry
  %0 = landingpad { ptr, i32 }
          catch ptr null
  %1 = extractvalue { ptr, i32 } %0, 0
  store ptr %1, ptr %exn.slot, align 4
  %2 = extractvalue { ptr, i32 } %0, 1
  store i32 %2, ptr %ehselector.slot, align 4
  br label %catch
catch:                                            ; preds = %lpad
  %exn = load ptr, ptr %exn.slot, align 4
  %3 = call ptr @__cxa_begin_catch(ptr %exn) #1
  call void @__cxa_end_catch()
  br label %try.cont
try.cont:                                         ; preds = %catch
  ret i32 0
unreachable:                                      ; preds = %entry
  unreachable
}
declare ptr @__cxa_allocate_exception(i32)
declare void @__cxa_throw(ptr, ptr, ptr)
declare i32 @__gxx_personality_v0(...)
declare ptr @__cxa_begin_catch(ptr)
declare void @__cxa_end_catch()

So I hope as per what is expected through happen while we use simple C++ exceptions is happening here and there is nothing wrong with the LLVM IR

  • Now if you see the workflow above, we need to produce a **.so from the llvm IR that would further give us a **.wasm that is loaded on top of the main module. But you can technically see that the wasm being generated is not correct.

Image

Basically the "catch" part of the code is missing completely. So what @sbc100 and I think is that .... although we are able to generate correct LLVM IR. We are somewhere going wrong in the LLVM IR to generating the shared object step

This is where is happens.

https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Wasm.cpp#L68-L90

And I am not sure how can we update the TargetMachine build here so that it doesn't miss the stuff relevant to the "catch" part of the code !!

@anutosh491
Copy link
Author

anutosh491 commented Feb 24, 2025

So technically answering your questions

  1. I'm not sure which EH method are you trying to use. Your first code was apparently using Emscripten EH (which was using invoke_*** functions) and why are you trying to switch to Wasm EH?

Yes, if we have a choice, I was looking to take the simplest route possible. So yeah I guess yeah we can stick the simplest C++ exceptions (which I guess is the default and we move to wasm EH is we use -fwasm-exceptions)

  1. Why I thought of maybe going for wasm EH ?

Because I notice that .... if I pass -fwasm-exceptions in the clang-repl cc1 command and I also add the following in the code above in wasm.cpp (and build llvm)

   llvm::TargetOptions TO = llvm::TargetOptions();
+ TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
  llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
      PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
  PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());

I notice that the correct wasm is generated and has inspiration from __cpp_exception

So yeah I was playing around this. And in this has although the correct wasm is generated .... the dlopen step fails when we trying to load this on top of the main module ... cause I think due to some reason the main module doesn't have __cpp_exception

Basically what I am trying to convey is .... I probably would like to make use of the emscripten EH but if I am struggling there maybe wasm EH might help me out ?

That being said if let's say we stick to the emscripten EH, I guess we anyways need to make us of an ExceptionModel.
I see these options here
https://github.com/llvm/llvm-project/blob/e7ad07ffb846a9812d9567b8d4b680045dce5b28/llvm/include/llvm/MC/MCTargetOptions.h#L19-L29

I guess Dwarfcfi or SJlJ make more sense to be used rather than what I did above ?

@anutosh491
Copy link
Author

The development while trying the wasm EH are mentioned here. I had to open an issue for the same

Check #23731

@anutosh491
Copy link
Author

That being said if let's say we stick to the emscripten EH, I guess we anyways need to make us of an ExceptionModel.
I see these options here
https://github.com/llvm/llvm-project/blob/e7ad07ffb846a9812d9567b8d4b680045dce5b28/llvm/include/llvm/MC/MCTargetOptions.h#L19-L29

I guess Dwarfcfi or SJlJ make more sense to be used rather than what I did above ?

Naah, I tried this. I thought we might just be able to excpetion model and get past this but not the case.

Just adding something like TO.ExceptionModel = llvm::ExceptionHandling::SjLj; or TO.ExceptionModel = llvm::ExceptionHandling::Dwarfcfi doesn't do much.

So yeah still at the starting point. Need to figure out how to correctly convert the LLVM IR (which is absolutely perfect) to an appropriate shared object !

@anutosh491
Copy link
Author

anutosh491 commented Feb 25, 2025

Hey @sbc100 @aheejin , I was able to get a working fix here :)

But might need some help to make my solution more concrete.

So what I realized is that

  1. The llvm IR is correct but the wasm module being generated was wrong.
  2. I see that by default we use the Legacy EH https://github.com/llvm/llvm-project/blob/2c7780a96d24e1e23657057fb735e13e2ba5d2ce/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.cpp#L63C28-L65
  3. And the Emscripten EH is false by default.
    https://github.com/llvm/llvm-project/blob/2c7780a96d24e1e23657057fb735e13e2ba5d2ce/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.cpp#L41-L44
  4. So I just reversed those things. Used false for Legacy EH and true for the Emscripten EH, built llvm with those changes.

And now I see the correct wasm module being generated. So I have stuff related to EH working perfectly !!

Image

So technically I see that I need to enable this option WasmEnableEmEH (or this flag I suppose enable-emscripten-cxx-exceptions)

How do I pass the same flag/enable this option here

https://github.com/llvm/llvm-project/blob/2c7780a96d24e1e23657057fb735e13e2ba5d2ce/clang/lib/Interpreter/Wasm.cpp#L68-L96

Where we go from the LLVM IR to the wasm module ? Once I know that, I can raise a PR to LLVM !

@aheejin
Copy link
Member

aheejin commented Feb 25, 2025

Ah Ok, so the invokes are generated in Clang but lowered down to calls in the backend. That's because -enable-emscripten-cxx-exceptions is false by default, as you observed. In case no exception handling mechanism is specified, we lower invokes to calls and remove landingpads:
https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L485

And you don't need to turn off -wasm-use-legacy-eh to false. This option has effects only when -wasm-enable-eh is true, which is false by default. So we have two modes of Wasm EH: Legacy and standard, for some historical reasons. The "standard" mode is not yet supported by most browsers so we use the "legacy" mode by default when using Wasm EH. But since you are fine with Emscripten EH you don't need to do anything here. Also nevermind about the exception model. You don't need to touch anything there when using Emscripten EH.

So to sum up the option you need to pass to the backend is -mllvm -enable-emscripten-cxx-exceptions. I guess you may want -mllvm -enable-emscripten-sjlj too for setjmp-longjmp handling. -mllvm options are LLVM middle/backend options and are not meant to be used directly in the command line by Emscripten users. But this is apparently relevant to you because you are working on the interface to the backend.
This option is automatically added by emscripten when you use -fexceptions or -sDISABLE_EXCEPTION_CATCHING=0:

if not settings.DISABLE_EXCEPTION_CATCHING:
args += ['-enable-emscripten-cxx-exceptions']

I'm not 100% sure where you can add these options, but I don't think that should be when creating a TargetMachine. Maybe you can try adding them here? https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/clang/lib/Interpreter/Interpreter.cpp#L200-L204
But I'm not sure if that would be something you can hardcode here and submit a PR. I actually wasn't even aware of the whole interpreter implementation for the wasm thing. Where does it get the initial flags? We have a file that converts clang options into clang_cc1 options: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/WebAssembly.cpp. Does clang-repl take the same set of arguments with clang?

@anutosh491
Copy link
Author

anutosh491 commented Feb 26, 2025

Hey @aheejin

Thanks a lot for the details explanation. Let's discuss your approach below

Does clang-repl take the same set of arguments with clang?

I think most of it should be inspired by clang itself. I see the following cc1 command. You can also see it through the console on the link running clang-repl in the browser through jupyterlite

"" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -main-file-name "<<< inputs >>>" -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/ -v -fcoverage-compilation-dir=/ -resource-dir /lib/clang/19 -internal-isystem /include/wasm32-emscripten/c++/v1 -internal-isystem /include/c++/v1 -internal-isystem /lib/clang/19/include -internal-isystem /include/wasm32-emscripten -internal-isystem /include -std=c++20 -fdeprecated-macro -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fincremental-extensions -o "<<< inputs >>>.o" -x c++ "<<< inputs >>>" clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target wasm32-unknown-emscripten

I'm not 100% sure where you can add these options, but I don't think that should be when creating a TargetMachine. Maybe you can try adding them here? https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/clang/lib/Interpreter/Interpreter.cpp#L200-L204

Hmmm, I don't think this worked.
This just adds -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjlj to the cc1 command above but not sure that's needed there as the llvm IR generated is already what we want. Basically I think these flags still are on the frontend side.

  1. So currently how this works is

code -> PTU -> llvm ir -> incr_module_xx.so (where xx stands for code block) -> incr_module_xx.wasm -> loaded on top of the main module using dlopen

So here the llvm ir step is correct but the shared object generated might not be going as expected.

  1. As you say I think it does boil down to this line

https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L485

So the current workflow would be like this

AddModule -> addPassesToEmitFile -> addPassesToGenerateCode -> addISelPasses & addIRPasses

I am guessing it is somewhere between this toolchain we need to switch on WasmEnableEmEH and maybe WasmEnableEmSjLj (not sure how this can be done though). Just boils down to enabling these two options somehow once we go inside AddModule responsible for generating the wasm Module here.

@anutosh491
Copy link
Author

Playing around with cl options is like 1 way to get this (not sure how concrete)

Introducing this diff

diff --git a/clang/lib/Interpreter/Wasm.cpp b/clang/lib/Interpreter/Wasm.cpp
index aa10b160ccf8..b8c83ff13725 100644
--- a/clang/lib/Interpreter/Wasm.cpp
+++ b/clang/lib/Interpreter/Wasm.cpp
@@ -16,6 +16,9 @@
 #include <llvm/IR/LegacyPassManager.h>
 #include <llvm/IR/Module.h>
 #include <llvm/MC/TargetRegistry.h>
+#include <llvm/Support/CommandLine.h>
 #include <llvm/Target/TargetMachine.h>
 
 #include <clang/Interpreter/Interpreter.h>
@@ -56,8 +59,28 @@ bool link(llvm::ArrayRef<const char *> args, llvm::raw_ostream &stdoutOS,
 
 #include <dlfcn.h>
 
+void enableWebAssemblyEHOptions() {
+    using namespace llvm;
+
+    if (cl::getRegisteredOptions().count("enable-emscripten-cxx-exceptions")) {
+        auto *option = static_cast<cl::opt<bool>*>(
+            cl::getRegisteredOptions()["enable-emscripten-cxx-exceptions"]);
+        option->setInitialValue(true);
+    }
+
+    if (cl::getRegisteredOptions().count("enable-emscripten-sjlj")) {
+        auto *option = static_cast<cl::opt<bool>*>(
+            cl::getRegisteredOptions()["enable-emscripten-sjlj"]);
+        option->setInitialValue(true);
+    }
+}
+
 namespace clang {
 
 WasmIncrementalExecutor::WasmIncrementalExecutor(
     llvm::orc::ThreadSafeContext &TSC)
     : IncrementalExecutor(TSC) {}
@@ -65,6 +88,23 @@ WasmIncrementalExecutor::WasmIncrementalExecutor(
 llvm::Error WasmIncrementalExecutor::addModule(PartialTranslationUnit &PTU) {
   std::string ErrorString;
+
+  enableWebAssemblyEHOptions();

   const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
       PTU.TheModule->getTargetTriple(), ErrorString);
   if (!Target) {
@@ -73,6 +113,7 @@ llvm::Error WasmIncrementalExecutor::addModule(PartialTranslationUnit &PTU) {
   }
 
   llvm::TargetOptions TO = llvm::TargetOptions();
   llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
       PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
   PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());

I see the correct wasm being generated for an example I tried

Image

So yeah the llvm IR is correct, its while we generate the wasm we need these flags I suppose

@aheejin
Copy link
Member

aheejin commented Feb 27, 2025

Does clang-repl take the same set of arguments with clang?

I think most of it should be inspired by clang itself. I see the following cc1 command. You can also see it through the console on the link running clang-repl in the browser through jupyterlite

"" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -main-file-name "<<< inputs >>>" -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/ -v -fcoverage-compilation-dir=/ -resource-dir /lib/clang/19 -internal-isystem /include/wasm32-emscripten/c++/v1 -internal-isystem /include/c++/v1 -internal-isystem /lib/clang/19/include -internal-isystem /include/wasm32-emscripten -internal-isystem /include -std=c++20 -fdeprecated-macro -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fincremental-extensions -o "<<< inputs >>>.o" -x c++ "<<< inputs >>>" clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target wasm32-unknown-emscripten

These are the flags given to clang -cc1, right? Then what are the flags that are given to clang-repl? (Is it correct that clang-repl internally invokes clang -cc, right?)

So clang and clang -cc1 flags are two different things. What's externally used by users is clang (in case of emscripten calling this is handled by emcc), and the clang driver translates the flags to create the set of flags to clang -cc1. Some of the flags, including -mllvm flags, are meant to be consumed by the LLVM middle/backend so they are just passed directly to clang -cc1. So what I asked was, I'm not sure how clang-repl is called (Is it called by clang? Or is it called directly?), but it has to take the right set of parameters. For example, clang needs -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjlj to enable Emscripten EH.

I'm not 100% sure where you can add these options, but I don't think that should be when creating a TargetMachine. Maybe you can try adding them here? https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/clang/lib/Interpreter/Interpreter.cpp#L200-L204

Hmmm, I don't think this worked. This just adds -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjlj to the cc1 command above but not sure that's needed there as the llvm IR generated is already what we want. Basically I think these flags still are on the frontend side.

No those -mllvm options, as I said, are LLVM backend options. Without them the backend will lower invokes to callss here:
https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L478-L489
These WasmEnableEH and WasmEnableEmEH are defined here: https://github.com/llvm/llvm-project/blob/f409340cc217c55c3960a375054a17b2bc927e53/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.cpp#L40-L56. As you can see, WasmEnableEmEH is enabled by -enable-emscripten-cxx-exceptions.

  1. As you say I think it does boil down to this line

https://github.com/llvm/llvm-project/blob/674dbcfe8f400db65f0d066ea638e977e8b82781/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L485
I am guessing it is somewhere between this toolchain we need to switch on WasmEnableEmEH and maybe WasmEnableEmSjLj (not sure how this can be done though). Just boils down to enabling these two options somehow once we go inside AddModule responsible for generating the wasm Module here.

Yes that looks like the culprit, and I'm saying you need -mllvm -enable-emscripten-cxx-exceptions to not go into that if.

@aheejin
Copy link
Member

aheejin commented Feb 27, 2025

Playing around with cl options is like 1 way to get this (not sure how concrete)

Introducing this diff

diff --git a/clang/lib/Interpreter/Wasm.cpp b/clang/lib/Interpreter/Wasm.cpp
index aa10b160ccf8..b8c83ff13725 100644
--- a/clang/lib/Interpreter/Wasm.cpp
+++ b/clang/lib/Interpreter/Wasm.cpp
@@ -16,6 +16,9 @@
 #include <llvm/IR/LegacyPassManager.h>
 #include <llvm/IR/Module.h>
 #include <llvm/MC/TargetRegistry.h>
+#include <llvm/Support/CommandLine.h>
 #include <llvm/Target/TargetMachine.h>
 
 #include <clang/Interpreter/Interpreter.h>
@@ -56,8 +59,28 @@ bool link(llvm::ArrayRef<const char *> args, llvm::raw_ostream &stdoutOS,
 
 #include <dlfcn.h>
 
+void enableWebAssemblyEHOptions() {
+    using namespace llvm;
+
+    if (cl::getRegisteredOptions().count("enable-emscripten-cxx-exceptions")) {
+        auto *option = static_cast<cl::opt<bool>*>(
+            cl::getRegisteredOptions()["enable-emscripten-cxx-exceptions"]);
+        option->setInitialValue(true);
+    }
+
+    if (cl::getRegisteredOptions().count("enable-emscripten-sjlj")) {
+        auto *option = static_cast<cl::opt<bool>*>(
+            cl::getRegisteredOptions()["enable-emscripten-sjlj"]);
+        option->setInitialValue(true);
+    }
+}
+
 namespace clang {
 
 WasmIncrementalExecutor::WasmIncrementalExecutor(
     llvm::orc::ThreadSafeContext &TSC)
     : IncrementalExecutor(TSC) {}
@@ -65,6 +88,23 @@ WasmIncrementalExecutor::WasmIncrementalExecutor(
 llvm::Error WasmIncrementalExecutor::addModule(PartialTranslationUnit &PTU) {
   std::string ErrorString;
+
+  enableWebAssemblyEHOptions();

   const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
       PTU.TheModule->getTargetTriple(), ErrorString);
   if (!Target) {
@@ -73,6 +113,7 @@ llvm::Error WasmIncrementalExecutor::addModule(PartialTranslationUnit &PTU) {
   }
 
   llvm::TargetOptions TO = llvm::TargetOptions();
   llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
       PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
   PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());

I see the correct wasm being generated for an example I tried

Image

So yeah the llvm IR is correct, its while we generate the wasm we need these flags I suppose

Yeah this basically does what I said, which is, enabling -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjlj. But I'm still wondering which set of options clang-repl receives, because what clang does is just to forward those options to clang -cc1, and I think clang-repl should do a similar thing, rather than trying to set something by itself.

Also the code very much looks like hardcoding and sets the EH options always true, and I don't think that diff is something we should commit to the main llvm repo.

@anutosh491
Copy link
Author

But I'm still wondering which set of options clang-repl receives, because what clang does is just to forward those options to clang -cc1, and I think clang-repl should do a similar thing, rather than trying to set something by itself.

Yes this is exactly what happens. Through clang-repl too we need to get to the LLVM-IR and there is no difference in how we get there through clang-repl or clang.

This is what the docs say

Clang-Repl is an interactive C++ interpreter that allows for incremental compilation. It supports interactive programming for C++ in a read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the high level programming language into LLVM IR. Then the LLVM IR is executed by the LLVM just-in-time (JIT) infrastructure.

Image

@anutosh491
Copy link
Author

anutosh491 commented Feb 27, 2025

IMP: That being said for running clang-repl in the browser, we don't use the LLVM Jit approach. So yeah if you see this code, everything remains same till the execute step ( called through ParseAndExecute)

  1. where for the native case, this ends up calling https://github.com/llvm/llvm-project/blob/a3165398db0736588daedb07650195502592e567/clang/lib/Interpreter/IncrementalExecutor.cpp#L73

  2. And for the wasm case, we use https://github.com/llvm/llvm-project/blob/a3165398db0736588daedb07650195502592e567/clang/lib/Interpreter/Wasm.cpp#L65

What we do here is this

code -> PTU -> llvm ir -> incr_module_xx.so (where xx stands for cell/code-block number) -> incr_module_xx.wasm -> loaded on top of the main module using dlopen

@anutosh491
Copy link
Author

Then what are the flags that are given to clang-repl? (Is it correct that clang-repl internally invokes clang -cc, right?)

Those would be absolutely the same flags + these one when using emscripten https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Interpreter.cpp#L200-L204

Some of the flags, including -mllvm flags, are meant to be consumed by the LLVM middle/backend so they are just passed directly to clang -cc1

Yeah I thought the above would do the job (actually my first thought was having -fexceptions and -fcxx-exceptions in our clang cc1 command would be enough but that didn't do the job) but when I tried adding them my cc1 command became

-cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -main-file-name "<<< inputs >>>" -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/ -v -fcoverage-compilation-dir=/ -resource-dir /lib/clang/19 -internal-isystem /include/wasm32-emscripten/c++/v1 -internal-isystem /include/c++/v1 -internal-isystem /lib/clang/19/include -internal-isystem /include/wasm32-emscripten -internal-isystem /include -std=c++20 -fdeprecated-macro -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjljfincremental-extensions -o "<<< inputs >>>.o" -x c++ "<<< inputs >>>"

But that didn't work. The wasm generated wasn't correct

@anutosh491
Copy link
Author

anutosh491 commented Feb 27, 2025

Hey @aheejin

Find a video below where I try running clang-repl in the browser (passing those flags to clang cc1) through a toy project of my own

Screen.Recording.2025-02-27.at.7.32.49.AM.mp4

This fails cause the wasm isn't correct here

Image

  1. Some flags come out of here https://github.com/anutosh491/clang-repl-wasm/blob/abad230fa0b1deb8946d9568bad11de3d0dc9ba9/CompilerModule.cpp#L39-L40 (passing those flags from here too makes no difference)

  2. others come out of here https://github.com/llvm/llvm-project/blob/a3165398db0736588daedb07650195502592e567/clang/lib/Interpreter/Interpreter.cpp#L200-L204

  3. Rest everything is pulled in through clang itself. clang-repl plays no role here !

@aheejin
Copy link
Member

aheejin commented Feb 28, 2025

I'm not familiar with clang-repl or https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Wasm.cpp, which looks like something you and a few other people added in recent months: https://github.com/llvm/llvm-project/commits/main/clang/lib/Interpreter/Wasm.cpp
So given that this is what you and others implemented, I'm maybe not the best person to answer how this works?

code -> PTU -> llvm ir -> incr_module_xx.so (where xx stands for code block) -> incr_module_xx.wasm -> loaded on top of the main module using dlopen

I really don't understand what this means or the clang-repl workflow diagram, and I'm not sure whether this is relevant to the current problem.

I also visited the Jupiter Notebook page you linked but I'm not sure how to proceed after seeing this image:
Image


I can only repeat what I said already: In general clang's point of view, clang receives a list of arguments. That arguments are converted (or forwarded) to clang -cc1 in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/Targets/WebAssembly.cpp. In case it helps, this is the backtrace of clang from the main to that WebAssembly.cpp file:

(gdb) bt
#0  clang::driver::toolchains::WebAssembly::addClangTargetOptions (this=0x55555562f730, 
    DriverArgs=..., CC1Args=llvm::SmallVector of Size 14, Capacity 16 = {...})
    at /usr/local/google/home/aheejin/llvm-git/clang/lib/Driver/ToolChains/WebAssembly.cpp:290
#1  0x00007ffff12d77b2 in clang::driver::tools::Clang::ConstructJob (this=0x555555630f30, 
    C=..., JA=..., Output=..., Inputs=llvm::SmallVector of Size 1, Capacity 4 = {...}, 
    Args=..., LinkingOutput=0x0)
    at /usr/local/google/home/aheejin/llvm-git/clang/lib/Driver/ToolChains/Clang.cpp:6163
#2  0x00007ffff11cdb28 in clang::driver::Driver::BuildJobsForActionNoCache (
    this=0x7fffffff76e0, C=..., A=0x555555631170, TC=0x55555562f730, BoundArch="", 
    AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0, 
    CachedResults=std::map with 1 element = {...}, 
    TargetDeviceOffloadKind=clang::driver::Action::OFK_None)
    at /usr/local/google/home/aheejin/llvm-git/clang/lib/Driver/Driver.cpp:6049
#3  0x00007ffff11cbfe9 in clang::driver::Driver::BuildJobsForAction (this=0x7fffffff76e0, 
    C=..., A=0x555555631170, TC=0x55555562f730, BoundArch="", AtTopLevel=true, 
    MultipleArchs=false, LinkingOutput=0x0, CachedResults=std::map with 1 element = {...}, 
    TargetDeviceOffloadKind=clang::driver::Action::OFK_None)
    at /usr/local/google/home/aheejin/llvm-git/clang/lib/Driver/Driver.cpp:5736
#4  0x00007ffff11c25a0 in clang::driver::Driver::BuildJobs (this=0x7fffffff76e0, C=...)
    at /usr/local/google/home/aheejin/llvm-git/clang/lib/Driver/Driver.cpp:5262
#5  0x00007ffff11bcc2a in clang::driver::Driver::BuildCompilation (this=0x7fffffff76e0, 
    ArgList=llvm::ArrayRef of length 22 = {...})
    at /usr/local/google/home/aheejin/llvm-git/clang/lib/Driver/Driver.cpp:1838
#6  0x000055555558ef95 in clang_main (Argc=22, Argv=0x7fffffffc548, ToolContext=...)
    at /usr/local/google/home/aheejin/llvm-git/clang/tools/driver/driver.cpp:372
#7  0x00005555555c1445 in main (argc=22, argv=0x7fffffffc548)
    at /usr/local/google/home/aheejin/llvm-git/build.debug/tools/clang/tools/driver/clang-driver.cpp:17

What I am asking are:

  1. How does clang-repl receive command line arguments?
  2. How does clang-repl convert / forward those arguments to clang -cc1?

The reason I mentioned https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/Targets/WebAssembly.cpp and pasted the backtrace is in case you need to mimic the process of converting/forwarding the arguments that's done in clang->clang -cc1 in your clang-repl. Probably the next step you can take is, to read how clang forwards its argument to clang -cc1, and try to do a similar thing in your implementation.

@anutosh491
Copy link
Author

I also visited the Jupiter Notebook page you linked but I'm not sure how to proceed after seeing this image:

Click on the kernel you want to use. That is c++20 in this case !

@anutosh491
Copy link
Author

anutosh491 commented Mar 19, 2025

Hey @aheejin sorry for talking long to get back on this. But I think this is what is happening.

So clang-repl would follow the exact stack trace that you shared above. This is what I see

  1. For usual clang

i) in the cc1_main command we end up calling ExecuteCompilerInvocation
ii) Which deals with the LLVMArgs from FrontEndOpts first and then moves onto calling ExecuteAction

  1. For clang-repl

i) As you said above everything starts with BuildCompilation which does the addClangTargetOptions step ..... followed by creating the CompilerInstance
ii) Once the CompileInstance is created is it passed to the Interpreter that we create on our side
iii) But if you move to the constructor for the interpreter you would see this

Interpreter::Interpreter(std::unique_ptr<CompilerInstance> Instance,
...........

  Act = std::make_unique<IncrementalAction>(*CI, *TSCtx->getContext(), ErrOut,
                                            *this, std::move(Consumer));
  if (ErrOut)
    return;
  CI->ExecuteAction(*Act);
........
}

iv) Now just like clang normally would do, we need a call to ExecuteAction which is done here too ! But how clang does it is through ExecuteCompilerInvocation where we first process the LLVMArgs and then make a call to ExecuteAction.

That looks like the missing key here.So adding this diff from what clang does is enough to do the job for me

@@ -375,6 +430,17 @@ Interpreter::Interpreter(std::unique_ptr<CompilerInstance> Instance,
                                             *this, std::move(Consumer));
   if (ErrOut)
     return;
+
+  if (!CI->getFrontendOpts().LLVMArgs.empty()) {
+    unsigned NumArgs = CI->getFrontendOpts().LLVMArgs.size();
+    auto Args = std::make_unique<const char*[]>(NumArgs + 2);
+    Args[0] = "clang-repl (LLVM option parsing)";
+    for (unsigned i = 0; i != NumArgs; ++i)
+      Args[i + 1] = CI->getFrontendOpts().LLVMArgs[i].c_str();
+    Args[NumArgs + 1] = nullptr;
+    llvm::cl::ParseCommandLineOptions(NumArgs + 1, Args.get());
+  }
   CI->ExecuteAction(*Act);

And now I can execute any try-catch block through clang-repl in the browser.

@anutosh491
Copy link
Author

anutosh491 commented Mar 19, 2025

Actually now that I say this. I would like to know your views on this.

  1. So clang uses ExecuteCompilerInvocation which does more on getFrontendOpts rather than just look into LLVMArgs

  2. Not sure all of this would be needed in clang-repl but then I think why ExecuteCompilerInvocation was not used in clang-repl is because of how we come up with the FrontendAction.

Clang-repl uses IncrementalAction (derived from FrontendAction)

Act = std::make_unique<IncrementalAction>(*CI, *TSCtx->getContext(), ErrOut,
                                            *this, std::move(Consumer));

whereas ExecuteCompilerInvocation uses FrontendAction

std::unique_ptr<FrontendAction> Act(CreateFrontendAction(*Clang));

So I guess we could have a ExecuteCompilerInvocation implemented in interpreter.cpp itself taking care of all options from FrontendOpts but the way in which we finally create the FrontendAction can be different ?

@anutosh491
Copy link
Author

Hey @aheejin,

I made a PR addressing this comment on mine

#23442 (comment)

You might be interested in having a look !

@aheejin
Copy link
Member

aheejin commented Mar 27, 2025

I'm not very familiar with the implementation of the interpreter part, so I'm not sure if I'm qualified to review the PR, but it looks that is the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants