Optimize llvm-nm #13465

juj · 2021-02-10T12:27:47Z

Optimize llvm-nm invocations from Emscripten. In python creating a multiprocessing pool is extremely slow, and can take 500ms-1 second. If there are only few files to llvm-nm, do those sequentially since a single llvm-nm only takes about 10-20ms.

kripken · 2021-02-10T20:39:55Z

We have a singleton multiprocessing pool, so if it's created anyhow I'm not sure this matters? Other uses are the JS optimizer (in wasm2js, but not wasm) and system library building, but there might be more.

If this is helpful to most builds, then this makes sense. But then can we do it as a wrapper around the pool, so it helps the other cases as well, and not just llvm-nm specifically? I assume the same reasoning applies there too.

juj · 2021-02-10T21:16:53Z

We have a singleton multiprocessing pool, so if it's created anyhow I'm not sure this matters?

That's right. Originally we had several multiprocessing pools. Then at some point I switched them to be a common singleton one. Then some time later I switched the singleton to be a create-on-demand pool. So we have a singleton pool that is kicked off when first needed.

Currently the pool is used exactly for these three purposes: for performing these parallel llvm-nms when linking code together, for the JS optimizer, and for building system libs. There are no other cases. JS optimizer is used only on wasm2js.

Note I don't change the pool behavior on those other use cases.

This PR change is a big improvement on debug build iteration times, which won't then need to spin up the pool at all in simple builds, as it will do the few llvm-nms in a single thread, which runs about 10x faster.

juj · 2021-02-11T11:01:53Z

Actually I now realized that we don't need the multiprocessing pool at all for this task. It is bad, slow, buggy, clunky, bloated, and all the other FUD adjectives that one can associate with a piece of software code they don't like. A major source of headache. :(

But now when looking at the problem this morning, I realize that of course the good old subprocess.Popen() is already an asynchronous parallel API - let's just use that. Rewrote the PR to do parallel Popen() calls instead. Much simpler and shorter.

Here are benchmarks in tiny and large contexts:

A tiny sized project

https://github.com/juj/wasm_webgpu , consists of a single .cpp file that builds to a .a library, and then three individual main apps that each have a single .cpp file, and each link in the static library. Generates two llvm-nm calls per app, for a total of 6x llvm-nm.

Before this PR, a debug iteration build took 3.3 seconds, of that time llvm-nm code eats up 2.3 seconds:

http://clb.confined.space/dump/toolchain_profiler.results_20210211_1209.html

After this PR, a debug iteration build takes 650 msecs, of that time llvm-nm code is 14 msecs: http://clb.confined.space/dump/toolchain_profiler.results_20210211_1214.html

A large sized project

https://github.com/gabrielcuvillier/d3wasm/ , Gabriel Cuvillier's port of Doom 3 over to Wasm. Final link consists of 225 calls to llvm-nm. CC @gabrielcuvillier

Before this PR, a release link step took 31.7 seconds, of which llvm-nm calls took up 3.8 seconds:

http://clb.confined.space/dump/d3wasm_baseline.html

After this PR, a release link step takes 28.9 seconds, of which llvm-nm calls take 900 msecs:

http://clb.confined.space/dump/d3wasm_llvm_nm_optimized.html

This should be a win across the board.

I dream of a world where python multiprocessing is not used at all.

juj · 2021-02-11T11:02:09Z

CI failures do not look related to this PR.

juj · 2021-02-11T15:34:17Z

Actually, it looks like llvm-nm supports taking in a list of files. Doing that, d3wasm build looks simply like

http://clb.confined.space/dump/d3wasm_optimized_2.html

taking up 570ms for the llvm-nm processing, a further -37% time reduction from the above, and a -85% time reduction to baseline.

kripken

Nice!

I am slightly surprised that running multiple invocations is not faster, as llvm-nm when given multiple files runs them sequentially, presumably. Is the issue that the process overhead is just bigger than the work llvm-nm does?

Assuming that is correct, then lgtm with also verifying the last iteration also speeds up a tiny program (I'm sure it will, but best to check).

juj · 2021-02-13T07:34:43Z

I am slightly surprised that running multiple invocations is not faster, as llvm-nm when given multiple files runs them sequentially, presumably. Is the issue that the process overhead is just bigger than the work llvm-nm does?

I believe so - python spawning the process, and llvm-nm getting to a state where it can start opening to process the files is probably 90% of the total execution time. Parallelizing amplifies the slow parts of the execution, plus python is ridiculously slow at spinning up parallelization.

Maybe if this was a native C/C++ app that spawned subprocesses it could be faster in parallel. But even then I doubt it, because spawning a load of subprocesses that all hit the disk could just be causing a bottleneck for the disk accesses, so a single program running through each disk access in neat sequence can be just as fast.

juj · 2021-02-13T07:35:57Z

Assuming that is correct, then lgtm with also verifying the last iteration also speeds up a tiny program (I'm sure it will, but best to check).

Given that the latest iteration optimized a whopping -37% of the time away, and the tiny program took just a magnitude of 10ms of time, it is trivially true that it won't regress (and even if it did, it would be in the scale of ~10ms).

juj · 2021-02-13T07:40:52Z

Ran through the profile nevertheless (btw would you have a chance to review #13464 at some point? I split it separate from this to avoid review churn, but it is awkward to keep octopus merging back working branches together after having posted a PR)

Result is the same 11ms from before. Compiler sanity checks and compiler.js dominate.

…ltiprocessing pool is *extremely* slow, and can take 500ms-1 second. If there are only few files to llvm-nm, do those sequentially since a single llvm-nm only takes about 10-20ms.

…: it accepts a list of files to process.

sbc100 · 2021-03-15T20:31:28Z

Looks like this broke windows users with many input files: #13465

…mscripten-core#13465.

* Add a test for #13661. See #13664 and #13465. * flake * Address review

juj force-pushed the optimize_llvm_nm branch 2 times, most recently from b3c1989 to de65035 Compare February 10, 2021 17:17

kripken approved these changes Feb 12, 2021

View reviewed changes

juj force-pushed the optimize_llvm_nm branch from a948575 to 71aab45 Compare February 13, 2021 13:28

juj added 7 commits February 13, 2021 18:50

Optimize llvm-nm invocations from Emscripten. In python creating a mu…

3fee633

…ltiprocessing pool is *extremely* slow, and can take 500ms-1 second. If there are only few files to llvm-nm, do those sequentially since a single llvm-nm only takes about 10-20ms.

Further optimize llvm-nm to avoid the multiprocessing pool completely.

8aeb02c

Further optimize multiple llvm-nm calls by just invoking llvm-nm once…

1a4bac0

…: it accepts a list of files to process.

Flake

6b9df62

Flake

a2e8c45

Handle llvm-nm nonzero exit code.

11984dd

Allow calling llvm_nm_multiple with zero files

34ae9a9

juj force-pushed the optimize_llvm_nm branch from 4d0656a to 34ae9a9 Compare February 13, 2021 16:50

juj enabled auto-merge (squash) February 13, 2021 16:50

Optimize llvm-nm for single file, skip complex parsing

9b2e687

juj force-pushed the optimize_llvm_nm branch from 99b39a6 to 9b2e687 Compare February 13, 2021 18:50

Fix llvm-nm on a mix of .a and .o files

1088e50

juj merged commit 93472e3 into emscripten-core:master Feb 13, 2021

kripken mentioned this pull request Feb 16, 2021

Enable REVERSE_DEPS=all on O0/O1 #13510

Open

jpharvey mentioned this pull request Mar 15, 2021

llvm-nm command line too long on windows with 2.0.15 for big link line #13661

Closed

juj added a commit to juj/emscripten that referenced this pull request Mar 16, 2021

Add a test for emscripten-core#13661. See emscripten-core#13664 and e…

65fa068

…mscripten-core#13465.

juj mentioned this pull request Mar 16, 2021

Build time increased about 50% in EMSDK 2.0.13 (related to -s USE_SDL=2 -s USE_SDL_TTF=2) #13676

Closed

juj added a commit that referenced this pull request Mar 17, 2021

Add test for internal use of response files on windows (#13673)

352385c

* Add a test for #13661. See #13664 and #13465. * flake * Address review

juj mentioned this pull request Mar 29, 2021

No system lib multiprocessing #13493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize llvm-nm #13465

Optimize llvm-nm #13465

Uh oh!

juj commented Feb 10, 2021

Uh oh!

kripken commented Feb 10, 2021 •

edited

Loading

Uh oh!

juj commented Feb 10, 2021

Uh oh!

juj commented Feb 11, 2021 •

edited

Loading

Uh oh!

juj commented Feb 11, 2021

Uh oh!

juj commented Feb 11, 2021 •

edited

Loading

Uh oh!

kripken left a comment

Uh oh!

juj commented Feb 13, 2021

Uh oh!

juj commented Feb 13, 2021 •

edited

Loading

Uh oh!

juj commented Feb 13, 2021

Uh oh!

sbc100 commented Mar 15, 2021

Uh oh!

Uh oh!

Optimize llvm-nm #13465

Optimize llvm-nm #13465

Uh oh!

Conversation

juj commented Feb 10, 2021

Uh oh!

kripken commented Feb 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juj commented Feb 10, 2021

Uh oh!

juj commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A tiny sized project

A large sized project

Uh oh!

juj commented Feb 11, 2021

Uh oh!

juj commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

juj commented Feb 13, 2021

Uh oh!

juj commented Feb 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juj commented Feb 13, 2021

Uh oh!

sbc100 commented Mar 15, 2021

Uh oh!

Uh oh!

kripken commented Feb 10, 2021 •

edited

Loading

juj commented Feb 11, 2021 •

edited

Loading

juj commented Feb 11, 2021 •

edited

Loading

juj commented Feb 13, 2021 •

edited

Loading