Skip to content

Convert extended string handling functions into JS library code #17403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 13, 2022

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Jul 8, 2022

This was already the case with MINIMAL_RUNTIME.

Add a new setting, which is currently enabled by default called LEGACY_RUNTIME
which enables all of these library functions by default.

As a followup we can consider disabling LEGACY_RUNTIME by default which will
give code size benefits by default.

@sbc100 sbc100 force-pushed the runtime_strings branch 3 times, most recently from 446cfa5 to f2db29f Compare July 8, 2022 21:30
@sbc100 sbc100 changed the base branch from move_runtime_allocate to main July 8, 2022 21:35
@sbc100 sbc100 force-pushed the runtime_strings branch from f2db29f to 6c017b1 Compare July 8, 2022 21:36
@sbc100 sbc100 requested a review from kripken July 8, 2022 21:36
@sbc100 sbc100 force-pushed the runtime_strings branch 7 times, most recently from ff588bc to ad0f669 Compare July 11, 2022 18:20
@sbc100
Copy link
Collaborator Author

sbc100 commented Jul 12, 2022

OK I added a new LEGACY_RUNTIME and decided to enable it by default for now. This means the code size benefits for most users won't come until we disable this by default.. which we can do as a followup.

@sbc100 sbc100 force-pushed the runtime_strings branch 2 times, most recently from d41861d to 75ed148 Compare July 12, 2022 18:41
@sbc100
Copy link
Collaborator Author

sbc100 commented Jul 12, 2022

This is now a more conservative changes since LEGACY_RUNTIME is on by default.. but I think its a good incremental change. PTAL.

@sbc100 sbc100 force-pushed the runtime_strings branch 2 times, most recently from c0ca093 to 614506c Compare July 12, 2022 18:47
// Ignore null values.
if (item) {
functionsInTableMap.set(item, i);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a separate fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually needed work around a JSDCE optmizer + closure issue.

Here is how I remember it going (roughly):

  1. addFunction (which depends on updateTableMap + functionsInTableMap) get added due to LEGACY_RUNTIME being set by default.
  2. JSDCE run which is able to remove addFunction.. but it only run a single round so leaved the now used updateTableMap function in place.
  3. Closure compiler then run as sees that there is no place where functionsInTableMap is ever set to anything by undefined which means it can then prove that functionsInTableMap.set is invalid here.

This seems like a bug that can occur when JSDCE (the single pass version) is used with closure compiler.. but its probably fairly rare.

In any case this seems like a good fix. If functionsInTableMap has not yet been initilzed when updateTableMap is called it would crash (without this fix).

Copy link
Member

@kripken kripken Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would closure actually do in step 3? It seems like any change it makes is ok since it's to a function that is never called.

Regardless I think you are right, reading the code it looks like this has been a possible bug. To hit it we'd need to load a side module that somehow avoids calling addFunction at all, and then the dylink call to updateTableMap will throw. I guess that's rare enough to never happen in practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closure reports an error saying functionsInTableMap.set is not a function since it knows that functionsInTableMap is undefined. By adding the check here make closure happy since it can tell that the undefined is being checked for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Yes, I guess partially optimized code can end up hitting closure errors in that way.

@@ -1 +1 @@
26698
26717
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just unoptimized size, but still, why does it increase? Shouldn't we be emitting the same code? (maybe we add more assertions instrumentation if we only add it to library functions and not runtime functions?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a 19 byte difference. Its likely just minor difference in the way library function get emitted.. I'm not sure its work investigating, especially since the diff will be tricky to read.

Copy link
Member

@kripken kripken Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. The diff to tests/other/test_unoptimized_code_size.js.size is almost 4K though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok .. I'll take a look at that one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I figured it out, its because this change also means that we get all of allocate/addFunction/removeFunction etc as part of LEGACY_RUNTIME which is now the default.

Would you rather I split the LEGACY_RUNTIME change from the moving the string functions to make this more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for checking.

I think it's fine in this PR to do both.

@sbc100 sbc100 force-pushed the runtime_strings branch from 614506c to d2f0141 Compare July 12, 2022 21:19
@sbc100 sbc100 requested a review from kripken July 12, 2022 21:19
@sbc100 sbc100 force-pushed the runtime_strings branch from d2f0141 to 21c3483 Compare July 12, 2022 22:26
@sbc100 sbc100 force-pushed the runtime_strings branch from 21c3483 to 088142c Compare July 13, 2022 00:09
@sbc100 sbc100 enabled auto-merge (squash) July 13, 2022 00:09
@sbc100 sbc100 disabled auto-merge July 13, 2022 00:10
@sbc100 sbc100 enabled auto-merge (squash) July 13, 2022 00:10
@sbc100 sbc100 merged commit f08ae1c into main Jul 13, 2022
@sbc100 sbc100 deleted the runtime_strings branch July 13, 2022 01:52
xbcnn pushed a commit to xbcnn/emscripten that referenced this pull request Jul 22, 2022
…ripten-core#17403)

This was already the case with MINIMAL_RUNTIME.  

Add a new setting, which is currently enabled by default called `LEGACY_RUNTIME`
which enables all of these library functions by default.

As a followup we can consider disabling `LEGACY_RUNTIME` by default which will
give code size benefits by default.
@dhdaines
Copy link

Hi, is this documented anywhere? I can't find it:

https://emscripten.org/search.html?q=LEGACY_RUNTIME&check_keywords=yes&area=default

Do we need to change our linker flags if we use these functions? I was using allocateUTF8OnStack, which, admittedly, was never documented either, and now I need -sLEGACY_RUNTIME=1 for my code to work.

Ideally I would pick and choose which of the functions to include, but I am not really sure how to do that (is this documented? -sEXPORTED_FUNCTIONS=allocateUTF8OnStack -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE=allocateUTF8OnStack does not seem to work)

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 18, 2022

Hi, is this documented anywhere? I can't find it:

https://emscripten.org/search.html?q=LEGACY_RUNTIME&check_keywords=yes&area=default

Do we need to change our linker flags if we use these functions? I was using allocateUTF8OnStack, which, admittedly, was never documented either, and now I need -sLEGACY_RUNTIME=1 for my code to work.

Ideally I would pick and choose which of the functions to include, but I am not really sure how to do that (is this documented? -sEXPORTED_FUNCTIONS=allocateUTF8OnStack -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE=allocateUTF8OnStack does not seem to work)

Its mostly documented in the changelog:

emscripten/ChangeLog.md

Lines 70 to 97 in 6416c35

- The `LEGACY_RUNTIME` setting is no longer enabled by default. If you use any
of these legacy runtime functions (except in library code with explicit
dependencies) then you would need to set `LEGACY_RUNTIME` on the command line
or add the ones you need to `DEFAULT_LIBRARY_FUNCS_TO_INCLUDE`:
- addFunction
- removeFunction
- allocate
- AsciiToString
- stringToAscii
- UTF16ToString
- stringToUTF16
- lengthBytesUTF16
- UTF32ToString
- stringToUTF32
- lengthBytesUTF32
- allocateUTF8
- allocateUTF8OnStack
- writeStringToMemory
- writeArrayToMemory
- writeAsciiToMemory
- intArrayFromString
- intArrayToString
- warnOnce
- ccall
- cwrap
Although this is technically a breaking change for those who use these
functions, there are assertion in debug builds that catch such usages and
direct towards how to fix the issue.

If you use the functions externally you should can use -sEXPORTED_RUNTIME_METHODS. If you only use the functions within the module (e.g.. in EM_JS or EM_ASM or --pre-js) then you should use -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE which does not export to the outside world, but does include the functions.

@dhdaines
Copy link

Hmm, -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE does not seem to work for me, but perhaps it is because CMake is mangling it somehow? When compiling with:

target_link_options(ssjs PRIVATE
  -Oz
  -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE=["allocateUTF8OnStack"]
  -sFILESYSTEM=0 -sMODULARIZE=1 -sALLOW_MEMORY_GROWTH=1 -sINITIAL_MEMORY=33554432)

I get:

error: undefined symbol: allocateUTF8OnStack (referenced by top-level compiled C/C++ code)

Looking in the EMCC_DEBUG=1 output, the command-line looks correct:

emcc:WARNING: invocation: /home/dhd/work/emsdk/upstream/emscripten/emcc.py -g -Oz -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE=["allocateUTF8OnStack"] -sFILESYSTEM=0 -sMODULARIZE=1 -sALLOW_MEMORY_GROWTH=1 -sINITIAL_MEMORY=33554432 @CMakeFiles/ssjs.dir/objects1.rsp -o ../soundswallower.js @CMakeFiles/ssjs.dir/linklibs.rsp (in /home/dhd/work/SoundSwallower/js/js)

Do I need the leading dollar-sign on allocateUTF8OnStack? Either CMake or emcc is barfing on that, as it turns it into a double dollar-sign, and I can't for the life of me figure out how to make it not do that.

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 18, 2022

Yes, -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE does need the leading dollersign and that is famously hard to escape in cmake. I think -sEXPORTED_RUNTIME_METHODS does not need the leading dollar sign.

Also, where are you using this function? If its from EM_ASM or EM_JS then you can now express that dependency in the source code itself. See #17854.

@dhdaines
Copy link

dhdaines commented Oct 18, 2022

Thanks for the quick reply!

Yes, it appears that somewhere in the guts of CMake, $allocateUTF8OnStack is being turned into \$$allocateUTF8OnStack in the link.txt file. Some random grepping does not allow me to determine exactly where this is happening :(

@dhdaines
Copy link

Ah. It is recognized as a bug in CMake, and there is a workaround:

https://stackoverflow.com/questions/68775616/how-not-to-have-dollar-sign-in-target-link-options-mangled

@dhdaines
Copy link

In my case the (very much Not Correct Modern Cmake (tm)) solution is this:

set(CMAKE_EXE_LINKER_FLAGS
  "-Oz -sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE=$allocateUTF8OnStack -sFILESYSTEM=0 -sMODULARIZE=1 -sALLOW_MEMORY_GROWTH=1 -sINITIAL_MEMORY=33554432")

@dhdaines
Copy link

Is there by any chance a way to specify -s options from a file instead of just on the command-line? That would solve the problem (and make things generally cleaner when there is a long list of functions to include/export/etc)...

@sbc100
Copy link
Collaborator Author

sbc100 commented Oct 19, 2022

Yes, you can specify any option that takes a list as a filelname using e.g.g [email protected].

Then in that file you just put one symbol on each line.

You can also put the entire command line into a file and just run emcc @flags.txt

@dhdaines
Copy link

Thank you!!! That will solve all my problems :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants