[TypedFunctionReferences] Add Typed Function References feature and use the types #3388

kripken · 2020-11-18T22:10:34Z

This adds the new feature and starts to use the new types where relevant. We
use them even without the feature being enabled, as we don't know the features
during wasm loading - but the hope is that given the type is a subtype, it should
all work out. In practice, if you print out the internal type you may see a typed
function reference-specific type for a ref.func for example, instead of a generic
funcref, but it should not affect anything else.

This PR does not support non-nullable types, that is, everything is nullable
for now. As suggested by @tlively this is simpler for now and leaves nullability
for later work (which will apparently require let or something else, and many
passes may need to be changed).

To allow this PR to work, we need to provide a type on creating a RefFunc. The
wasm-builder.h internal API is updated for this, as are the C and JS APIs,
which are breaking changes. cc @dcodeIO

We must also write and read function types properly. This PR improves
collectSignatures to find all the types, and also to sort them by the
dependencies between them (as we can't emit X in the binary if it depends
on Y, and Y has not been emitted - we need to give Y's index). This sorting
ends up changing a few test outputs.

InstrumentLocals support for printing function types that are not funcref
is disabled for now, until we figure out how to make that work and/or
decide if it's important enough to work on.

The fuzzer has various fixes to emit valid types for things (mostly
whitespace there). Also two drive-by fixes to call makeTrivial where it
should be (when we fail to create a specific node, we can't just try to make
another node, in theory it could infinitely recurse).

Binary writing changes here to replace calls to a standalone function to
write out a type with one that is called on the binary writer object itself,
which maintains a mapping of type indexes (getFunctionSignatureByIndex).

This looks ok after fuzzing.

src/ir/module-utils.h

src/passes/InstrumentLocals.cpp

src/wasm/wasm-type.cpp

tlively · 2020-11-19T20:46:55Z

src/wasm/wasm-validator.cpp

@@ -2313,6 +2313,7 @@ void FunctionValidator::visitFunction(Function* curr) {
  for (const auto& var : curr->vars) {
    features |= var.getFeatures();
    shouldBeTrue(var.isConcrete(), curr, "vars must be concretely typed");
+    // TODO: check for nullability


We should enforce that locals are non-nullable now, as well as enforcing that types in block signatures and function signatures are non-nullable.

Why should locals be non-nullable? Shouldn't they be rather nullable, given that they need a null initialization value?

test/binaryen.js/kitchen-sink.js

test/passes/translate-to-fuzz_all-features.txt

tlively · 2020-11-19T20:59:07Z

src/wasm/wasm-binary.cpp

+      // FIXME: support non-nullable types. search for all "nullable = "
+      //        comments.
+      return Type(getHeapType(), /* nullable = */ true);


I think it would be better to error out here rather than silently change the users' program.

I thought the plan we agreed on was to just do everything with nullable types for now, which is definitely broken, but it's a stepping stone? Erroring out makes sense in production-ready code, I agree, but this is all very much wip.

I agree that we should only support nullable types for now, but IMO that should include parsing as well. In general, this makes our level of support explicit to our users and will help us find the places we need to change later when we do want to support nullable types.

Do you mean parsing of custom user types (not parsing of ref.func etc.)? I guess there's no harm to allow non-nullable types there for now, as long as we don't test them, sgtm.

I thought the plan we agreed on was to just do everything with nullable types for now.

I think we interpreted that differently. My interpretation was that we should only support nullable types for now, which to me means erroring out when encountering a non-nullable type in the user input. It looks like your interpretation allows for "supporting" non-nullable types from the user's point of view by treating them as nullable types. I feel less strongly about this particular instance now because some of the implementation strategies we were brainstorming earlier did involve treating non-nullable types as nullable in Binaryen IR, but on a related note, do you plan to add validation that local, block, and function types do not contain non-nullable references in this PR?

Co-authored-by: Thomas Lively <[email protected]>

kripken · 2020-11-20T17:11:05Z

Updated to properly parse non-nullable types in the binary format (we will almost certainly error if they are used in a binary, and they are not tested / cannot be tested).

aheejin

Haven't read the whole PR yet, but some basic questions to improve my understanding:

According to the spec, regardless of the current Binaryen implementation, is ref.func's return type a nullable type or a non-nullable type?
Why is the feature set not available at loading type?
What do we do if we assume more refined types for function references, and later it turns out we only have the basic reference types enabled but not typed function references?

aheejin · 2020-11-20T17:26:41Z

src/ir/module-utils.h

+            counts[heapType.getSignature()]++;
+          }
+        }
+        // Specific places also have a special signature.
        if (auto* call = curr->dynCast<CallIndirect>()) {
          counts[call->sig]++;


Is it possible to have each param of a sig can be also a typed function reference type? If it is possible I guess we have to consider this in everywhere we collect signatures, including call_indirect or block like constructs...?

Yes, each param can be a typed function reference type. But I'm not sure what you mean in the second sentence - what do you mean by "collect signatures"? (I think this is the one place that collects them, so there is no other place to worry about?)

There are several places we count signatures, each of which consists of params and results, and each param or result can be a typed reference type that contains a signature inside. What I meant was, when we count signatures call call_indirect, block types, or loop types, ..., should we check if each param/result is in turn also a typed reference type, in case it contains a signature? Maybe I'm misunderstanding something.

Oh, do you mean to check recursively? So given a type like (ref func (param i32 (ref func (f64))) we should also note the child (ref func f64)?

If so then I think you're right. That was not needed before but seems like it's needed now. I'll add a TODO - for now this is not a problem as the fuzzer doesn't emit such things, but we should handle it.

src/wasm/wasm-binary.cpp

kripken · 2020-11-20T18:57:08Z

According to the spec, regardless of the current Binaryen implementation, is ref.func's return type a nullable type or a non-nullable type?

I think the spec says it is non-nullable. That is, it takes advantage of the fact that it definitely is not null when you take a reference to an actual function by name.

Why is the feature set not available at loading type?

It's a little complicated because of the feature section, which applies features, and that the user can also apply features through flags. My initial thought was to apply the user features before loading the wasm - so if the user said "use X", we'd know X is present - but @tlively pointed out that that would affect how the features section works. Specifically, we want to error if the user flags don't match the feature section flags, and if we apply the user flags first to the wasm, we'd need to change how we do that comparison (because we look at what's applied on the wasm).

It seems like there is complexity both ways. I'm not 100% sure what's best to be honest.

What do we do if we assume more refined types for function references, and later it turns out we only have the basic reference types enabled but not typed function references?

It should "just work". Without the newer feature for typed function references enabled, there is no instruction that can notice the difference. Or so at least we hope 😄 It definitely has observable effects for binaryen, though. For example, if a pass wants to save a value to a local, and creates a local of that type, then that is always valid with reference types (it's ok to make a local of type funcref) but not ok if it's a non-nullable typed function reference type (since a local has an initial value which would be null). But, for now we are ignoring non-nullable types anyhow for other reasons - that they require let and other work, that maybe we don't need immediately - so this is hopefully not an issue.

But this does add some oddness, to use newer types than the features. But avoiding this complexity would require a change to the features section handling, as in the last answer...

src/ir/module-utils.h

This reverts commit 78e8a97.

aheejin · 2020-11-21T07:35:39Z

Why is the feature set not available at loading type?

It's a little complicated because of the feature section, which applies features, and that the user can also apply features through flags. My initial thought was to apply the user features before loading the wasm - so if the user said "use X", we'd know X is present - but @tlively pointed out that that would affect how the features section works. Specifically, we want to error if the user flags don't match the feature section flags, and if we apply the user flags first to the wasm, we'd need to change how we do that comparison (because we look at what's applied on the wasm).

It seems like there is complexity both ways. I'm not 100% sure what's best to be honest.

Sorry I think I still lack some understanding 😅 So we can know user flags before loading, but we can't know what feature section contains before loading, is this what you're saying? If so, is that because feature sections are read at the end?

And I'm not sure what you mean by when user flags and the feature section do not match. Do we error out? If so, can't we just use user-given features anyway from the beginning, given that we are gonna error out if they turn out to be incorrect later?

Also, I'm not sure what you mean by "both ways". Sorry for so many questions...

What do we do if we assume more refined types for function references, and later it turns out we only have the basic reference types enabled but not typed function references?

It should "just work". Without the newer feature for typed function references enabled, there is no instruction that can notice the difference. Or so at least we hope 😄 It definitely has observable effects for binaryen, though. For example, if a pass wants to save a value to a local, and creates a local of that type, then that is always valid with reference types (it's ok to make a local of type funcref) but not ok if it's a non-nullable typed function reference type (since a local has an initial value which would be null). But, for now we are ignoring non-nullable types anyhow for other reasons - that they require let and other work, that maybe we don't need immediately - so this is hopefully not an issue.

But this does add some oddness, to use newer types than the features. But avoiding this complexity would require a change to the features section handling, as in the last answer...

Isn't the output (both binary and text) different for funcref and typed function references? If we don't have TFR enabled but we have TFRs in-memory, aren't they gonna make differences when they are finally written out to output files? For example, writeType in wasm-binary.cpp doesn't seem to check if TFR feature is enabled. It seems I am not understanding some core assumptions..

kripken · 2020-11-21T18:11:37Z

@tlively

do you plan to add validation that local, block, and function types do not contain non-nullable references in this PR?

I didn't plan on adding anything else to this PR 😄 My goal here is not to be 100% comprehensive. We have some unknowns about our plans about about the spec, so doing it all here would just add unnecessary work now or require unnecessary work later to remove.

What I see this PR as doing is adding the minimal amount of support for the typed function references feature, and use the types, and get the fuzzer passing (and maybe doing a little more, if it's already been done, and is code that is easier to leave in than do the work to remove it and remember to add it later). That leaves quite a lot of future work, but unblocks work on call_ref and on GC, which is my main goal.

I don't see thiis PR as creating technical debt or otherwise slowing us down later. I'd agree that would be something to avoid, if I missed such a thing here.

@aheejin

Sorry I think I still lack some understanding 😅 So we can know user flags before loading, but we can't know what feature section contains before loading, is this what you're saying? If so, is that because feature sections are read at the end?

Probably @tlively can explain that better as I don't fully understand how our feature section handling works.

However, I'd like to leave that as a separate question from this PR, if that's ok? If we decide to change the approach (and check the feature before using the types, which means changing how we use the feature section) that would not be hard to do later, but would be a larger discussion on changing how the feature section works (which I'd rather not block GC work on).

Isn't the output (both binary and text) different for funcref and typed function references? If we don't have TFR enabled but we have TFRs in-memory, aren't they gonna make differences when they are finally written out to output files? For example, writeType in wasm-binary.cpp doesn't seem to check if TFR feature is enabled. It seems I am not understanding some core assumptions..

It does seem like it shouldn't work 😉 but it does, I think. If TFR is disabled then a ref.func would still have a more refined type that is part of TFR. But that is not actually written to the binary - luckily, the binary format is concise and does not write a type for ref.func. The type is not needed since we know the type of the function already. And it seems to work out for ref.null too, which does provide a type, but if TFR is disabled then that type would not be a TFR type - we don't create a new type in-memory there.

kripken · 2020-11-21T18:31:06Z

@tlively

Maybe a shorter way to state my position: I don't consider this PR to be "correct" in terms of nullability. I'm suggesting that we don't care about nullability for now, and leave it for later. The benefit is that that can save us work given the uncertainties here. But I sense that you want there to be a proper mental model for nullability even in this PR? If so, the options are to either remove it entirely, or something else that would be more work either now or later, I'm pretty sure. So I hope we can agree to having "wrong" nullability for now,

tlively · 2020-11-22T06:07:37Z

It's definitely not the way I would have written it, but I see where you're coming from :) I know you're not about to stop work on this and leave nullability half-done, so I'm fine with landing it this way. I'll give it one last pass through to make sure I don't have any other concerns.

tlively · 2020-11-22T06:51:28Z

Looks like the fuzz test just has to be updated again.

@aheejin, I posted #3393 to more fully explain what we are currently doing with features and start a discussion about whether we should make any changes to how we handle them.

aheejin · 2020-11-22T09:13:41Z

@kripken

According to the spec, regardless of the current Binaryen implementation, is ref.func's return type a nullable type or a non-nullable type?

I think the spec says it is non-nullable. That is, it takes advantage of the fact that it definitely is not null when you take a reference to an actual function by name.

Then is it OK make it a non-nullable type in the implementation? I agree we should start with non-nullable types, but I'm just wondering if this is spec-compliant. For example, this does not seem to be valid in the spec, but we might treat this valid..?

(ref.as_non_null (ref.func $foo))

I guess it doesn't matter now because we don't have ref.as_non_null yet in the impl, but I'm wondering generally making ref.func's return type more wider (i.e., supertype) of its real return type would be OK.

Isn't the output (both binary and text) different for funcref and typed function references? If we don't have TFR enabled but we have TFRs in-memory, aren't they gonna make differences when they are finally written out to output files? For example, writeType in wasm-binary.cpp doesn't seem to check if TFR feature is enabled. It seems I am not understanding some core assumptions..

It does seem like it shouldn't work 😉 but it does, I think. If TFR is disabled then a ref.func would still have a more refined type that is part of TFR. But that is not actually written to the binary - luckily, the binary format is concise and does not write a type for ref.func. The type is not needed since we know the type of the function already. And it seems to work out for ref.null too, which does provide a type, but if TFR is disabled then that type would not be a TFR type - we don't create a new type in-memory there.

So you mean, even though we don't know the feature set while reading a module (I asked a question above that below), we know the feature set when we write back the module to either text of binary, so we can emit the write types that conform to the current feature set? That makes sense, thanks.

@kripken and @tlively
My question is, regardless of how we should handle features input, either from user or the features section or both, can't we determine the feature set for Binaryen before loading the module? @kripken said this PR was done under the assumption that we don't know the feature set while we read a module. I'm wondering why that's the case. (I also read #3393, but I'm still not sure why all those feature set decision can't precede module loading)

tlively · 2020-11-22T22:59:00Z

My question is, regardless of how we should handle features input, either from user or the features section or both, can't we determine the feature set for Binaryen before loading the module?

No, because the feature set is determined in part by the target features section. The only way to read the target features section right now is to load the module.

aheejin · 2020-11-22T23:07:12Z

My question is, regardless of how we should handle features input, either from user or the features section or both, can't we determine the feature set for Binaryen before loading the module?

No, because the feature set is determined in part by the target features section. The only way to read the target features section right now is to load the module.

So we parse and read code before reading the features section? Is that because the features section occurs after the code section?

tlively · 2020-11-22T23:28:28Z

According to the spec, regardless of the current Binaryen implementation, is ref.func's return type a nullable type or a non-nullable type?

I think the spec says it is non-nullable. That is, it takes advantage of the fact that it definitely is not null when you take a reference to an actual function by name.

Then is it OK make it a non-nullable type in the implementation? I agree we should start with non-nullable types, but I'm just wondering if this is spec-compliant.

(I think you wrote "non-nullable" in this message where you meant "nullable". Is that correct?)

Since non-nullable references are subtypes of their corresponding nullable references, they can be used anywhere the nullable references can be used. That means that treating instructions that produce non-nullable references as though they produced nullable references only lets them be used in locations where they could already be used. It therefore does not allow any invalid modules to validate.

For example, this does not seem to be valid in the spec, but we might treat this valid..?
(ref.as_non_null (ref.func $foo))

This is valid in the spec because ref.func produces a non-nullable reference, which is a subtype of a nullable reference and can therefore be passed to ref.as_non_null, which expects a nullable reference.

tlively · 2020-11-22T23:31:38Z

My question is, regardless of how we should handle features input, either from user or the features section or both, can't we determine the feature set for Binaryen before loading the module?

No, because the feature set is determined in part by the target features section. The only way to read the target features section right now is to load the module.

So we parse and read code before reading the features section? Is that because the features section occurs after the code section?

Yes, that's correct. In theory we could skip ahead and read the target features section first, but that wouldn't buy us much because using the more refined types internally works out fine.

kripken · 2020-11-23T17:28:32Z

(I do suspect this is not spec-compliant for some reason or other, detectable in a spec test, but this is wip - we'll fix those issues later.)

kripken added 8 commits November 18, 2020 13:48

work

30f3cc3

fix

f23a104

format

0103261

test fix

8e37792

cleanup

4f3841f

cleanup

adc4f4d

missing test

bd93f0a

restore test name

0a7e153

kripken requested review from tlively and aheejin November 18, 2020 22:10

kripken added 3 commits November 18, 2020 15:31

fix test

f92bc3a

update test

55d30ff

fix another JS test

19399ca

tlively reviewed Nov 19, 2020

View reviewed changes

kripken and others added 5 commits November 19, 2020 13:55

fix

0654e5c

comment

21e78e6

Update src/wasm/wasm-binary.cpp

1079186

Co-authored-by: Thomas Lively <[email protected]>

Merge remote-tracking branch 'origin/master' into tfr

b5b4e32

Parse non-nullable types in the binary format

423ed28

aheejin reviewed Nov 20, 2020

View reviewed changes

Prepare for GC a little by generalizing heap type writing code

78e8a97

kripken added 7 commits November 20, 2020 12:53

fix counting of signatures

7de3c0f

test update

033a887

fix

1b4aa7c

fix

fbe7f96

fix

60b7312

fix

05700e8

update tests

9813f9c

tlively reviewed Nov 21, 2020

View reviewed changes

src/ir/module-utils.h Outdated Show resolved Hide resolved

kripken added 2 commits November 20, 2020 17:05

cleanup

fdc30c1

Revert "Prepare for GC a little by generalizing heap type writing code"

325a85d

This reverts commit 78e8a97.

kripken added 3 commits November 21, 2020 09:43

Merge remote-tracking branch 'origin/master' into tfr

df1dc6e

review feedback

106abec

review feedback

357a3e4

tlively approved these changes Nov 22, 2020

View reviewed changes

update test

6c7432d

kripken force-pushed the tfr branch from b2f2256 to 6c7432d Compare November 23, 2020 17:42

kripken merged commit b2d797f into master Nov 23, 2020

kripken deleted the tfr branch November 23, 2020 19:14

[TypedFunctionReferences] Add Typed Function References feature and use the types #3388

[TypedFunctionReferences] Add Typed Function References feature and use the types #3388

Uh oh!

Conversation

kripken commented Nov 18, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Nov 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aheejin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kripken commented Nov 20, 2020

Uh oh!

Uh oh!

aheejin commented Nov 21, 2020

Uh oh!

kripken commented Nov 21, 2020

Uh oh!

kripken commented Nov 21, 2020

Uh oh!

tlively commented Nov 22, 2020

Uh oh!

tlively commented Nov 22, 2020

Uh oh!

aheejin commented Nov 22, 2020

Uh oh!

tlively commented Nov 22, 2020

Uh oh!

aheejin commented Nov 22, 2020

Uh oh!

tlively commented Nov 22, 2020

Uh oh!

tlively commented Nov 22, 2020

Uh oh!

kripken commented Nov 23, 2020

Uh oh!

Uh oh!

kripken commented Nov 20, 2020 •

edited

Loading

aheejin left a comment •

edited

Loading