Skip to content

1.38.24 does not generate asm.js with correct variable names with -g3/g4 #7883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
milizhang opened this issue Jan 18, 2019 · 4 comments
Closed

Comments

@milizhang
Copy link
Contributor

Symptom

The situation seems to be close to what was described in #5094.

For the following source code:

#include <stdio.h>

int f(int a, int b)
{
	return a + b;
}

int main()
{
	int a = 1;
	int b = 2;
	int c = f(a, b);

	printf("%d\n", b);
	b = c + 3;

	printf("%d\n", b);

	return 0;
}

We will get results like the following in 1.38.24 with emcc -s WASM=0 -g4 1.c -o 1.html:

function _f($0,$1) {
 $0 = $0|0;
 $1 = $1|0;
 var $2 = 0, $3 = 0, $4 = 0, $5 = 0, $6 = 0, label = 0, sp = 0;
 sp = STACKTOP;
 STACKTOP = STACKTOP + 16|0; if ((STACKTOP|0) >= (STACK_MAX|0)) abortStackOverflow(16|0);
 $2 = $0;
 $3 = $1;
 $4 = $2; //@line 5 "1.c"
 $5 = $3; //@line 5 "1.c"
 $6 = (($4) + ($5))|0; //@line 5 "1.c"
 STACKTOP = sp;return ($6|0); //@line 5 "1.c"
}

function _main() {
 var $0 = 0, $1 = 0, $10 = 0, $2 = 0, $3 = 0, $4 = 0, $5 = 0, $6 = 0, $7 = 0, $8 = 0, $9 = 0, $vararg_buffer = 0, $vararg_buffer1 = 0, label = 0, sp = 0;
 sp = STACKTOP;
 STACKTOP = STACKTOP + 32|0; if ((STACKTOP|0) >= (STACK_MAX|0)) abortStackOverflow(32|0);
 $vararg_buffer1 = sp + 8|0;
 $vararg_buffer = sp;
 $0 = 0;
 $1 = 1; //@line 10 "1.c"
 $2 = 2; //@line 11 "1.c"
 $4 = $1; //@line 12 "1.c"
 $5 = $2; //@line 12 "1.c"
 $6 = (_f($4,$5)|0); //@line 12 "1.c"
 $3 = $6; //@line 12 "1.c"
 $7 = $2; //@line 14 "1.c"
 HEAP32[$vararg_buffer>>2] = $7; //@line 14 "1.c"
 (_printf(2776,$vararg_buffer)|0); //@line 14 "1.c"
 $8 = $3; //@line 15 "1.c"
 $9 = (($8) + 3)|0; //@line 15 "1.c"
 $2 = $9; //@line 15 "1.c"
 $10 = $2; //@line 17 "1.c"
 HEAP32[$vararg_buffer1>>2] = $10; //@line 17 "1.c"
 (_printf(2776,$vararg_buffer1)|0); //@line 17 "1.c"
 STACKTOP = sp;return 0; //@line 19 "1.c"
}

As shown above, all the variable names are stripped away here.

I can also confirm that 1.35.0 is emitting the variable names properly in asm.js. What we will get is like the following:

function _f($a,$b) {
 $a = $a|0;
 $b = $b|0;
 var $0 = 0, $1 = 0, $2 = 0, $3 = 0, $4 = 0, label = 0, sp = 0;
 sp = STACKTOP;
 STACKTOP = STACKTOP + 16|0; if ((STACKTOP|0) >= (STACK_MAX|0)) abort();
 $0 = $a;
 $1 = $b;
 $2 = $0; //@line 5 "1.c"
 $3 = $1; //@line 5 "1.c"
 $4 = (($2) + ($3))|0; //@line 5 "1.c"
 STACKTOP = sp;return ($4|0); //@line 5 "1.c"
}

function _main() {
 var $0 = 0, $1 = 0, $2 = 0, $3 = 0, $4 = 0, $5 = 0, $6 = 0, $7 = 0, $a = 0, $b = 0, $c = 0, $vararg_buffer = 0, $vararg_buffer1 = 0, label = 0, sp = 0;
 sp = STACKTOP;
 STACKTOP = STACKTOP + 32|0; if ((STACKTOP|0) >= (STACK_MAX|0)) abort();
 $vararg_buffer1 = sp + 8|0;
 $vararg_buffer = sp;
 $0 = 0;
 $a = 1; //@line 10 "1.c"
 $b = 2; //@line 11 "1.c"
 $1 = $a; //@line 12 "1.c"
 $2 = $b; //@line 12 "1.c"
 $3 = (_f($1,$2)|0); //@line 12 "1.c"
 $c = $3; //@line 12 "1.c"
 $4 = $b; //@line 14 "1.c"
 HEAP32[$vararg_buffer>>2] = $4; //@line 14 "1.c"
 (_printf(672,$vararg_buffer)|0); //@line 14 "1.c"
 $5 = $c; //@line 15 "1.c"
 $6 = (($5) + 3)|0; //@line 15 "1.c"
 $b = $6; //@line 15 "1.c"
 $7 = $b; //@line 17 "1.c"
 HEAP32[$vararg_buffer1>>2] = $7; //@line 17 "1.c"
 (_printf(672,$vararg_buffer1)|0); //@line 17 "1.c"
 STACKTOP = sp;return 0; //@line 19 "1.c"
}

What is happening?

I believe Emscripten relies on the value names in LLVM IR to properly generate variable names in asm.js outputs. At some point, Clang introduced an option "-discard-value-names" in CC1, and seems like change rC263394 (Integrated into fastcomp-clang around 1.36.7) enabled it by default for NDEBUG builds:

// Disable the verification pass in -asserts builds.
#ifdef NDEBUG
  CmdArgs.push_back("-disable-llvm-verifier");
  // Discard LLVM value names in -asserts builds.
  CmdArgs.push_back("-discard-value-names");
#endif

This option discarded all the variable names in generated LLVM IRs, and effectively broke variable name generation for all prebuilt Emscripten SDK binaries since 1.36.7.

In the beginning I was surprised that there is only one single report of the issue (#5094) and it is even marked as closed. After digging a little bit into root cause, I think the potential reason why this issue has been so covert is probably because some of the Emscripten contributors might not be able to actually repro this issue, as they are likely running debug builds and does not have discard-value-names enabled by default. This explains why @kripken originally dismissed #5094 as not repro.

Potential Fixes

For most of clang's use cases, it makes sense to discard value names from LLVM IRs for non-contributor builds, as typical users only needs the DWARF information to debug their compiled binary. However, I feel it is probably not trivial for Emscripten to generate variable names from the DWARF information instead, and at the same time I believe we can all agree that leaving the variable name generation generation broken does significantly impairs user's ability to debug generated JS code.

Fortunately, later version of Clang adds a pair of options to the clang driver (-fdiscard-value-names and -fno-discard-value-names) to allow override of this behavior. If we do not have any near term plan to integrate a much newer version of clang, then it seems like the easiest solution would be to port this change to emscripten-fastcomp-clang, and turn on -fno-discard-value-names for emcc -g3 / -g4.

Thank you very much.

@kripken
Copy link
Member

kripken commented Jan 22, 2019

Thanks @milizhang! Those details are very helpful.

For the asm.js/fastcomp backend, it sounds like we need to port the -fno-discard-value-names option? (there was no other way to disable this behavior before those were added?)

For the LLVM wasm backend, we can assume people have a quite new LLVM build, so we can just use that flag.

In both cases, yes, sounds like we should pass that flag when -g3 or above.

@milizhang
Copy link
Contributor Author

For asm.js backend, I think these new options were added as previously the behavior was controlled by cc1 argument -discard-value-names, but not directly accessible from the driver.
Interestingly, without additional changes to fastcomp clang, I think the behavior can be forced ON using -Xclang -discard-value-names, but once it is appended to cc1 arguments on there is no way to turn it back off from the driver.

Therefore I think porting -fdiscard-value-names and -fno-discard-value-names is the best option for now.

@kripken
Copy link
Member

kripken commented Jan 25, 2019

Sounds good to me.

I'm not sure offhand how easy it would be to port those? A first step might be to make emcc emit those flags for the LLVM wasm backend path, where we assume we are using a very recent LLVM, which has them.

kripken pushed a commit that referenced this issue Mar 19, 2019
This change is another part that aims to solve #7883, by adding "-fno-discard-value-names" to clang invocation calls from emcc.py, when debug levels >= 3.

It depends on emscripten-core/emscripten-fastcomp-clang#27.
@kripken
Copy link
Member

kripken commented Mar 19, 2019

I think this was fixed by those PRs. Please comment/reopen if there is remaining work here.

@kripken kripken closed this as completed Mar 19, 2019
VirtualTim pushed a commit to VirtualTim/emscripten that referenced this issue May 21, 2019
This change is another part that aims to solve emscripten-core#7883, by adding "-fno-discard-value-names" to clang invocation calls from emcc.py, when debug levels >= 3.

It depends on emscripten-core/emscripten-fastcomp-clang#27.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants