Skip to content

Commit 972e659

Browse files
authored
Reintroduce wasm-merge (#5709)
We used to have a wasm-merge tool but removed it for a lack of use cases. Recently use cases have been showing up in the wasm GC space and elsewhere, as people are using more diverse toolchains together, for example a project might build some C++ code alongside some wasm GC code. Merging those wasm files together can allow for nice optimizations like inlining and better DCE etc., so it makes sense to have a tool for merging. Background: * Removal: #1969 * Requests: * wasm-merge - why it has been deleted #2174 * Compiling and linking wat files #2276 * wasm-link? #2767 This PR is a compete rewrite of wasm-merge, not a restoration of the original codebase. The original code was quite messy (my fault), and also, since then we've added multi-memory and multi-table which makes things a lot simpler. The linking semantics are as described in the "wasm-link" issue #2767 : all we do is merge normal wasm files together and connect imports and export. That is, we have a graph of modules and their names, and each import to a module name can be resolved to that module. Basically, like a JS bundler would do for JS, or, in other words, we do the same operations as JS code would do to glue wasm modules together at runtime, but at compile time. See the README update in this PR for a concrete example. There are no plans to do more than that simple bundling, so this should not really overlap with wasm-ld's use cases. This should be fairly fast as it works in linear time on the total input code. However, it won't be as fast as wasm-ld, of course, as it does build Binaryen IR for each module. An advantage to working on Binaryen IR is that we can easily do some global DCE after merging, and further optimizations are possible later.
1 parent 44cd751 commit 972e659

30 files changed

+2171
-11
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ full changeset diff at the end of each section.
1515
Current Trunk
1616
-------------
1717

18+
- Add a `wasm-merge` tool. This is a full rewrite of the previous `wasm-merge`
19+
tool that was removed from the tree in the past. The new version is much
20+
simpler after recent improvements to multi-memory and multi-table. The
21+
rewrite was motivated by new use cases for merging modules in the context of
22+
WasmGC.
1823
- Some C and JS API functions now refer to data and element segments by name
1924
instead of index.
2025
- The --nominal and --hybrid command line options and related API functions have

README.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,9 @@ This repository contains code that builds the following tools in `bin/`:
223223
performs emscripten-specific passes over it.
224224
* **wasm-ctor-eval**: A tool that can execute functions (or parts of functions)
225225
at compile time.
226+
* **wasm-merge**: Merges multiple wasm files into a single file, connecting
227+
corresponding imports to exports as it does so. Like a bundler for JS, but
228+
for wasm.
226229
* **binaryen.js**: A standalone JavaScript library that exposes Binaryen methods for [creating and optimizing Wasm modules](https://github.com/WebAssembly/binaryen/blob/main/test/binaryen.js/hello-world.js). For builds, see [binaryen.js on npm](https://www.npmjs.com/package/binaryen) (or download it directly from [github](https://github.com/raw/AssemblyScript/binaryen.js/master/index.js), [rawgit](https://cdn.rawgit.com/AssemblyScript/binaryen.js/master/index.js), or [unpkg](https://unpkg.com/binaryen@latest/index.js)). Minimal requirements: Node.js v15.8 or Chrome v75 or Firefox v78.
227230

228231
Usage instructions for each are below.
@@ -562,6 +565,150 @@ as mentioned earlier, but there is no limitation on what you can execute here.
562565
Any export from the wasm can be executed, if its contents are suitable. For
563566
example, in Emscripten `wasm-ctor-eval` is even run on `main()` when possible.
564567

568+
### wasm-merge
569+
570+
`wasm-merge` combines wasm files together. For example, imagine you have a
571+
project that uses wasm files from multiple toolchains. Then it can be helpful to
572+
merge them all into a single wasm file before shipping, since in a single wasm
573+
file the calls between the modules become just normal calls inside a module,
574+
which allows them to be inlined, dead code eliminated, and so forth, potentially
575+
improving speed and size.
576+
577+
For example, imagine we have these two wasm files:
578+
579+
```wat
580+
;; a.wasm
581+
(module
582+
(import "second" "bar" (func $second.bar))
583+
584+
(export "main" (func $func))
585+
586+
(func $func
587+
(call $second.bar)
588+
)
589+
)
590+
```
591+
592+
```wat
593+
;; b.wasm
594+
(module
595+
(import "outside" "log" (func $log (param i32)))
596+
597+
(export "bar" (func $func))
598+
599+
(func $func
600+
(call $log
601+
(i32.const 42)
602+
)
603+
)
604+
)
605+
```
606+
607+
The filenames on your local drive are `a.wasm` and `b.wasm`, but for merging /
608+
bundling purposes let's say that the first is known as `"first"` and the second
609+
as `"second"`. That is, we want the first module's import of `"second.bar"` to
610+
call the function `$func` in the second module. Here is a wasm-merge command for
611+
that:
612+
613+
```
614+
wasm-merge a.wasm first b.wasm second -o output.wasm
615+
```
616+
617+
We give it the first wasm file, then its name, and then the second wasm file
618+
and then its name. The merged output is this:
619+
620+
```wat
621+
(module
622+
(import "second" "bar" (func $second.bar))
623+
(import "outside" "log" (func $log (param i32)))
624+
625+
(export "main" (func $func))
626+
(export "bar" (func $func_2))
627+
628+
(func $func
629+
(call $func_2)
630+
)
631+
632+
(func $func_2
633+
(call $log
634+
(i32.const 42)
635+
)
636+
)
637+
)
638+
```
639+
640+
`wasm-merge` combined the two files into one, merging their functions, imports,
641+
etc., all while fixing up name conflicts and connecting corresponding imports to
642+
exports. In particular, note how `$func` calls `$func_2`, which is exactly what
643+
we wanted: `$func_2` is the function from the second module (renamed to avoid a
644+
name collision).
645+
646+
Note that the wasm output in this example could benefit from additional
647+
optimization. First, the call to `$func_2` can now be easily inlined, so we can
648+
run `wasm-opt -O3` to do that for us. Also, we may not need all the imports and
649+
exports, for which we can run
650+
[wasm-metadce](https://github.com/WebAssembly/binaryen/wiki/Pruning-unneeded-code-in-wasm-files-with-wasm-metadce#example-pruning-exports).
651+
A good workflow could be to run `wasm-merge`, then `wasm-metadce`, then finish
652+
with `wasm-opt`.
653+
654+
`wasm-merge` is kind of like a bundler for wasm files, in the sense of a "JS
655+
bundler" but for wasm. That is, with the wasm files above, imagine that we had
656+
this JS code to instantiate and connect them at runtime:
657+
658+
```js
659+
// Compile the first module.
660+
var first = await fetch("a.wasm");
661+
first = new WebAssembly.Module(first);
662+
663+
// Compile the first module.
664+
var second = await fetch("b.wasm");
665+
second = new WebAssembly.Module(second);
666+
667+
// Instantiate the second, with a JS import.
668+
second = new WebAssembly.Instance(second, {
669+
outside: {
670+
log: (value) => {
671+
console.log('value:', value);
672+
}
673+
}
674+
});
675+
676+
// Instantiate the first, importing from the second.
677+
first = new WebAssembly.Instance(first, {
678+
second: second.exports
679+
});
680+
681+
// Call the main function.
682+
first.exports.main();
683+
```
684+
685+
What `wasm-merge` does is basically what that JS does: it hooks up imports to
686+
exports, resolving names using the module names you provided. That is, by
687+
running `wasm-merge` we are moving the work of connecting the modules from
688+
runtime to compile time. As a result, after running `wasm-merge` we need a lot
689+
less JS to get the same result:
690+
691+
```js
692+
// Compile the single module.
693+
var merged = await fetch("merged.wasm");
694+
merged = new WebAssembly.Module(merged);
695+
696+
// Instantiate it with a JS import.
697+
merged = new WebAssembly.Instance(merged, {
698+
outside: {
699+
log: (value) => {
700+
console.log('value:', value);
701+
}
702+
}
703+
});
704+
705+
// Call the main function.
706+
merged.exports.main();
707+
```
708+
709+
We still need to fetch and compile the merged wasm, and to provide it the JS
710+
import, but the work to connect two wasm modules is not needed any more.
711+
565712
## Testing
566713

567714
```

scripts/fuzz_opt.py

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,11 @@ def random_size():
8282
return random.randint(INPUT_SIZE_MIN, 2 * INPUT_SIZE_MEAN - INPUT_SIZE_MIN)
8383

8484

85+
def make_random_input(input_size, raw_input_data):
86+
with open(raw_input_data, 'wb') as f:
87+
f.write(bytes([random.randint(0, 255) for x in range(input_size)]))
88+
89+
8590
def run(cmd, stderr=None, silent=False):
8691
if not silent:
8792
print(' '.join(cmd))
@@ -1284,6 +1289,62 @@ def handle(self, wasm):
12841289
compare_between_vms(fix_output(wasm_exec), fix_output(evalled_wasm_exec), 'CtorEval')
12851290

12861291

1292+
# Tests wasm-merge
1293+
class Merge(TestCaseHandler):
1294+
frequency = 0.15
1295+
1296+
def handle(self, wasm):
1297+
# generate a second wasm file to merge. note that we intentionally pick
1298+
# a smaller size than the main wasm file, so that reduction is
1299+
# effective (i.e., as we reduce the main wasm to small sizes, we also
1300+
# end up with small secondary wasms)
1301+
# TODO: add imports and exports that connect between the two
1302+
wasm_size = os.stat(wasm).st_size
1303+
second_size = min(wasm_size, random_size())
1304+
second_input = abspath('second_input.dat')
1305+
make_random_input(second_size, second_input)
1306+
second_wasm = abspath('second.wasm')
1307+
run([in_bin('wasm-opt'), second_input, '-ttf', '-o', second_wasm] + FUZZ_OPTS + FEATURE_OPTS)
1308+
1309+
# sometimes also optimize the second module
1310+
if random.random() < 0.5:
1311+
opts = get_random_opts()
1312+
run([in_bin('wasm-opt'), second_wasm, '-o', second_wasm, '-all'] + FEATURE_OPTS + opts)
1313+
1314+
# merge the wasm files. note that we must pass -all, as even if the two
1315+
# inputs are MVP, the output may have multiple tables and multiple
1316+
# memories (and we must also do that in the commands later down).
1317+
#
1318+
# Use --skip-export-conflicts as we only look at the first module's
1319+
# exports for now - we don't care about the second module's.
1320+
# TODO: compare the second module's exports as well, but we'd need
1321+
# to handle renaming of conflicting exports.
1322+
merged = abspath('merged.wasm')
1323+
run([in_bin('wasm-merge'), wasm, 'first',
1324+
abspath('second.wasm'), 'second', '-o', merged,
1325+
'--skip-export-conflicts'] + FEATURE_OPTS + ['-all'])
1326+
1327+
# sometimes also optimize the merged module
1328+
if random.random() < 0.5:
1329+
opts = get_random_opts()
1330+
run([in_bin('wasm-opt'), merged, '-o', merged, '-all'] + FEATURE_OPTS + opts)
1331+
1332+
# verify that merging in the second module did not alter the output.
1333+
output = run_bynterp(wasm, ['--fuzz-exec-before', '-all'])
1334+
output = fix_output(output)
1335+
merged_output = run_bynterp(merged, ['--fuzz-exec-before', '-all'])
1336+
merged_output = fix_output(merged_output)
1337+
1338+
# a complication is that the second module's exports are appended, so we
1339+
# have extra output. to handle that, just prune the tail, so that we
1340+
# only compare the original exports from the first module.
1341+
# TODO: compare the second module's exports to themselves as well, but
1342+
# they may have been renamed due to overlaps...
1343+
merged_output = merged_output[:len(output)]
1344+
1345+
compare_between_vms(output, merged_output, 'Merge')
1346+
1347+
12871348
# Check that the text format round-trips without error.
12881349
class RoundtripText(TestCaseHandler):
12891350
frequency = 0.05
@@ -1306,6 +1367,7 @@ def handle(self, wasm):
13061367
Asyncify(),
13071368
TrapsNeverHappen(),
13081369
CtorEval(),
1370+
Merge(),
13091371
# FIXME: Re-enable after https://github.com/WebAssembly/binaryen/issues/3989
13101372
# RoundtripText()
13111373
]
@@ -1329,7 +1391,7 @@ def test_one(random_input, given_wasm):
13291391
randomize_fuzz_settings()
13301392
pick_initial_contents()
13311393

1332-
opts = randomize_opt_flags()
1394+
opts = get_random_opts()
13331395
print('randomized opts:', '\n ' + '\n '.join(opts))
13341396
print()
13351397

@@ -1503,7 +1565,7 @@ def write_commands(commands, filename):
15031565
("--type-merging",)}
15041566

15051567

1506-
def randomize_opt_flags():
1568+
def get_random_opts():
15071569
flag_groups = []
15081570
has_flatten = False
15091571

@@ -1643,8 +1705,7 @@ def randomize_opt_flags():
16431705
'iters/sec, ', total_wasm_size / elapsed,
16441706
'wasm_bytes/sec, ', ignored_vm_runs,
16451707
'ignored\n')
1646-
with open(raw_input_data, 'wb') as f:
1647-
f.write(bytes([random.randint(0, 255) for x in range(input_size)]))
1708+
make_random_input(input_size, raw_input_data)
16481709
assert os.path.getsize(raw_input_data) == input_size
16491710
# remove the generated wasm file, so that we can tell if the fuzzer
16501711
# fails to create one

scripts/update_help_checks.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727

2828
TOOLS = ['wasm-opt', 'wasm-as', 'wasm-dis', 'wasm2js', 'wasm-ctor-eval',
2929
'wasm-shell', 'wasm-reduce', 'wasm-metadce', 'wasm-split',
30-
'wasm-fuzz-types', 'wasm-emscripten-finalize']
30+
'wasm-fuzz-types', 'wasm-emscripten-finalize', 'wasm-merge']
3131

3232

3333
def main():

src/ir/module-utils.h

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -132,12 +132,9 @@ inline DataSegment* copyDataSegment(const DataSegment* segment, Module& out) {
132132
return out.addDataSegment(std::move(ret));
133133
}
134134

135-
inline void copyModule(const Module& in, Module& out) {
136-
// we use names throughout, not raw pointers, so simple copying is fine
137-
// for everything *but* expressions
138-
for (auto& curr : in.exports) {
139-
out.addExport(new Export(*curr));
140-
}
135+
// Copies named toplevel module items (things of kind ModuleItemKind). See
136+
// copyModule() for something that also copies exports, the start function, etc.
137+
inline void copyModuleItems(const Module& in, Module& out) {
141138
for (auto& curr : in.functions) {
142139
copyFunction(curr.get(), out);
143140
}
@@ -159,6 +156,15 @@ inline void copyModule(const Module& in, Module& out) {
159156
for (auto& curr : in.dataSegments) {
160157
copyDataSegment(curr.get(), out);
161158
}
159+
}
160+
161+
inline void copyModule(const Module& in, Module& out) {
162+
// we use names throughout, not raw pointers, so simple copying is fine
163+
// for everything *but* expressions
164+
for (auto& curr : in.exports) {
165+
out.addExport(std::make_unique<Export>(*curr));
166+
}
167+
copyModuleItems(in, out);
162168
out.start = in.start;
163169
out.customSections = in.customSections;
164170
out.debugInfoFileNames = in.debugInfoFileNames;
@@ -354,6 +360,36 @@ template<typename T> inline void iterImports(Module& wasm, T visitor) {
354360
iterImportedTags(wasm, visitor);
355361
}
356362

363+
// Iterates over all importable module items. The visitor provided should have
364+
// signature void(ExternalKind, Importable*).
365+
template<typename T> inline void iterImportable(Module& wasm, T visitor) {
366+
for (auto& curr : wasm.functions) {
367+
if (curr->imported()) {
368+
visitor(ExternalKind::Function, curr.get());
369+
}
370+
}
371+
for (auto& curr : wasm.tables) {
372+
if (curr->imported()) {
373+
visitor(ExternalKind::Table, curr.get());
374+
}
375+
}
376+
for (auto& curr : wasm.memories) {
377+
if (curr->imported()) {
378+
visitor(ExternalKind::Memory, curr.get());
379+
}
380+
}
381+
for (auto& curr : wasm.globals) {
382+
if (curr->imported()) {
383+
visitor(ExternalKind::Global, curr.get());
384+
}
385+
}
386+
for (auto& curr : wasm.tags) {
387+
if (curr->imported()) {
388+
visitor(ExternalKind::Tag, curr.get());
389+
}
390+
}
391+
}
392+
357393
// Helper class for performing an operation on all the functions in the module,
358394
// in parallel, with an Info object for each one that can contain results of
359395
// some computation that the operation performs.

src/tools/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ binaryen_add_executable(wasm-ctor-eval wasm-ctor-eval.cpp)
1717
if(NOT BUILD_EMSCRIPTEN_TOOLS_ONLY)
1818
binaryen_add_executable(wasm-shell wasm-shell.cpp)
1919
binaryen_add_executable(wasm-reduce wasm-reduce.cpp)
20+
binaryen_add_executable(wasm-merge wasm-merge.cpp)
2021
binaryen_add_executable(wasm-fuzz-types "${fuzzing_SOURCES};wasm-fuzz-types.cpp")
2122
endif()
2223

0 commit comments

Comments
 (0)