Skip to content

Enable REVERSE_DEPS=all on O0/O1 #13510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pfaffe opened this issue Feb 16, 2021 · 21 comments
Open

Enable REVERSE_DEPS=all on O0/O1 #13510

pfaffe opened this issue Feb 16, 2021 · 21 comments
Assignees
Labels

Comments

@pfaffe
Copy link
Collaborator

pfaffe commented Feb 16, 2021

Now that we have a REVERSE_DEPS setting, can we make all the default on O0 and O1 builds? The all setting matches the expectation of "fast builds, suboptimal binaries" of the O0/O1 optimizer levels.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2021

I took a quick look at doing this when I originally added REVERSE_DEPS.

I was a little concerned by one thing which was that REVERSE_DEPS=all ends up adding a lot of extra imports (I think it was a lot GL functions due to some kind getProcAddress thing), which made me a little sad. I was hoping we could fix that import explosion before doing this. On the other hand maybe it doesn't really matter and O0/O1?

@kripken
Copy link
Member

kripken commented Feb 16, 2021

It looks like all increases hello world wasm sizes from 2K to 32K, and JS sizes from 53K to 300K (all in -O1). There is no speedup in link times that I can detect there.

How much of a speedup do large projects get at link time with this? Is it significant after the recent nm optimizations in #13465?

@sbc100 sbc100 self-assigned this Feb 16, 2021
@pfaffe
Copy link
Collaborator Author

pfaffe commented Feb 16, 2021

I'm saving 25% or 5.5s on a 21s link with that option.

@kripken
Copy link
Member

kripken commented Feb 16, 2021

@pfaffe , is that even after that last optimization I linked to? (it landed 3 days ago)

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2021

From looking at #13465 it looks like it might not be great for apps that have a lot of .a archives and few .o files. That change combines all the .o files into a single command but seems to serialize the nm commands for archive files.. which seems like it could make things worse for projects like llvm that use almost exclusively .a archives, no?

@pfaffe
Copy link
Collaborator Author

pfaffe commented Feb 16, 2021

@kripken it's with 2.0.14, did that change make it in?

@kripken
Copy link
Member

kripken commented Feb 16, 2021

@sbc100 Good question, I'm not sure. cc @juj

@pfaffe It looks like it made it in.

@kripken
Copy link
Member

kripken commented Feb 16, 2021

@pfaffe do you see a single llvm-nm call in the EMCC_DEBUG=1 output, which takes 5 seconds?

@pfaffe
Copy link
Collaborator Author

pfaffe commented Feb 16, 2021

I see a lots of them in the profile, each not terribly expensive.

@kripken
Copy link
Member

kripken commented Feb 16, 2021

I'm confused as to why you'd see more than one. @sbc100 the cause appears to be this code in building.py:

# Any .a files that have multiple .o files will have hard time parsing. Scan those
# sequentially to confirm. TODO: Move this to use run_multiple_processes()
# when available.
for f in files:
  if f not in nm_cache:
    nm_cache[f] = llvm_nm(f)

I'm unsure why we need those extra calls?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 17, 2021

I looks like that recent change combines calls to llvm-nm for object files.. but not for library files. (See the TODO).

We could probably refactor such that all libraryies and object can be processed in one go? @juj worked on this most recently so is probably best places to know.

@pfaffe
Copy link
Collaborator Author

pfaffe commented Feb 17, 2021

@kripken Did you measure the impact on non-trivial projects as well?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 17, 2021

@pfaffe now that you have this option presumably you can add it to your project and this makes this change less urgent.

I'm hoping there may be ways we can reduce the impact on small projects before we decide to move forward.

@pfaffe
Copy link
Collaborator Author

pfaffe commented Feb 17, 2021

The impact on small projects seems negligible. A constant worst-case overhead of 30k/250k isn't a lot, in particular when the optimizer happily removes it. I expect people who care about the size to run at higher opt levels anyways, so they're not affected.

It would be great to get more data on projects that aren't tiny. What's the worst case and average overhead on the emscripten test-suite?

@kripken
Copy link
Member

kripken commented Feb 17, 2021

Testing on Binaryen, which is larger than almost everything in the emscripten test suite, all takes 2.9 seconds while auto takes 4.1. A 30% speedup is pretty nice there, and I can easily believe that scales up with larger projects.

Testing compile times, all doesn't appear to regress compile times for tiny programs despite emitting much larger code for them.

Overall I can't see a strong reason against enabling all in -O0/-O1. The only risk seems to be users that look at / ship by mistake the code size of unoptimized builds. Hopefully we can reduce that risk with docs and a changelog entry.

However, there may be some downside I am missing. It would be even safer to do this if we had a solution to the bulk of the increase, which is from emscripten_GetProcAddress and such things that have indirect calls. I wonder if we can avoid linking that if -lGL is not passed in, for example?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 17, 2021

OK, given your research there it does seem reasonable to move forward with this change. I'd still love to see the emscripten_GetProcAddress problem addressed.

@kripken
Copy link
Member

kripken commented Feb 17, 2021

#13524 helps with GetProcAddress. In general I think that moving more code from JS to C can remove the need for deps_info entries.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 17, 2021

What is the remaining overhead of REVERSE_DEPS=all ? I bet its pretty minimal.

@kripken
Copy link
Member

kripken commented Feb 17, 2021

With that PR? Still quite large, because there are a few other big methods in deps_info.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 24, 2021

We should at least land #13524 before we do this.

kripken added a commit that referenced this issue Feb 24, 2021
This avoids having them on the right-hand-side in deps_info, which
means that in REVERSE_DEPS=all mode we do not include that
function and all the GL code it calls. Instead, the callers are just more
C functions that the linker does the right thing.

Helps #13510
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants