-
Notifications
You must be signed in to change notification settings - Fork 786
[TypedFunctionReferences] Add Typed Function References feature and use the types #3388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2313,6 +2313,7 @@ void FunctionValidator::visitFunction(Function* curr) { | |||
for (const auto& var : curr->vars) { | |||
features |= var.getFeatures(); | |||
shouldBeTrue(var.isConcrete(), curr, "vars must be concretely typed"); | |||
// TODO: check for nullability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enforce that locals are non-nullable now, as well as enforcing that types in block signatures and function signatures are non-nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should locals be non-nullable? Shouldn't they be rather nullable, given that they need a null initialization value?
src/wasm/wasm-binary.cpp
Outdated
// FIXME: support non-nullable types. search for all "nullable = " | ||
// comments. | ||
return Type(getHeapType(), /* nullable = */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to error out here rather than silently change the users' program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the plan we agreed on was to just do everything with nullable types for now, which is definitely broken, but it's a stepping stone? Erroring out makes sense in production-ready code, I agree, but this is all very much wip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should only support nullable types for now, but IMO that should include parsing as well. In general, this makes our level of support explicit to our users and will help us find the places we need to change later when we do want to support nullable types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean parsing of custom user types (not parsing of ref.func
etc.)? I guess there's no harm to allow non-nullable types there for now, as long as we don't test them, sgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the plan we agreed on was to just do everything with nullable types for now.
I think we interpreted that differently. My interpretation was that we should only support nullable types for now, which to me means erroring out when encountering a non-nullable type in the user input. It looks like your interpretation allows for "supporting" non-nullable types from the user's point of view by treating them as nullable types. I feel less strongly about this particular instance now because some of the implementation strategies we were brainstorming earlier did involve treating non-nullable types as nullable in Binaryen IR, but on a related note, do you plan to add validation that local, block, and function types do not contain non-nullable references in this PR?
Updated to properly parse non-nullable types in the binary format (we will almost certainly error if they are used in a binary, and they are not tested / cannot be tested). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't read the whole PR yet, but some basic questions to improve my understanding:
- According to the spec, regardless of the current Binaryen implementation, is
ref.func
's return type a nullable type or a non-nullable type? - Why is the feature set not available at loading type?
- What do we do if we assume more refined types for function references, and later it turns out we only have the basic reference types enabled but not typed function references?
counts[heapType.getSignature()]++; | ||
} | ||
} | ||
// Specific places also have a special signature. | ||
if (auto* call = curr->dynCast<CallIndirect>()) { | ||
counts[call->sig]++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have each param of a sig can be also a typed function reference type? If it is possible I guess we have to consider this in everywhere we collect signatures, including call_indirect
or block
like constructs...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each param can be a typed function reference type. But I'm not sure what you mean in the second sentence - what do you mean by "collect signatures"? (I think this is the one place that collects them, so there is no other place to worry about?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several places we count signatures, each of which consists of params and results, and each param or result can be a typed reference type that contains a signature inside. What I meant was, when we count signatures call call_indirect
, block types, or loop types, ..., should we check if each param/result is in turn also a typed reference type, in case it contains a signature? Maybe I'm misunderstanding something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, do you mean to check recursively? So given a type like (ref func (param i32 (ref func (f64)))
we should also note the child (ref func f64)
?
If so then I think you're right. That was not needed before but seems like it's needed now. I'll add a TODO - for now this is not a problem as the fuzzer doesn't emit such things, but we should handle it.
I think the spec says it is non-nullable. That is, it takes advantage of the fact that it definitely is not null when you take a reference to an actual function by name.
It's a little complicated because of the feature section, which applies features, and that the user can also apply features through flags. My initial thought was to apply the user features before loading the wasm - so if the user said "use X", we'd know X is present - but @tlively pointed out that that would affect how the features section works. Specifically, we want to error if the user flags don't match the feature section flags, and if we apply the user flags first to the wasm, we'd need to change how we do that comparison (because we look at what's applied on the wasm). It seems like there is complexity both ways. I'm not 100% sure what's best to be honest.
It should "just work". Without the newer feature for typed function references enabled, there is no instruction that can notice the difference. Or so at least we hope 😄 It definitely has observable effects for binaryen, though. For example, if a pass wants to save a value to a local, and creates a local of that type, then that is always valid with reference types (it's ok to make a local of type But this does add some oddness, to use newer types than the features. But avoiding this complexity would require a change to the features section handling, as in the last answer... |
This reverts commit 78e8a97.
Sorry I think I still lack some understanding 😅 So we can know user flags before loading, but we can't know what feature section contains before loading, is this what you're saying? If so, is that because feature sections are read at the end? And I'm not sure what you mean by when user flags and the feature section do not match. Do we error out? If so, can't we just use user-given features anyway from the beginning, given that we are gonna error out if they turn out to be incorrect later? Also, I'm not sure what you mean by "both ways". Sorry for so many questions...
Isn't the output (both binary and text) different for |
I didn't plan on adding anything else to this PR 😄 My goal here is not to be 100% comprehensive. We have some unknowns about our plans about about the spec, so doing it all here would just add unnecessary work now or require unnecessary work later to remove. What I see this PR as doing is adding the minimal amount of support for the typed function references feature, and use the types, and get the fuzzer passing (and maybe doing a little more, if it's already been done, and is code that is easier to leave in than do the work to remove it and remember to add it later). That leaves quite a lot of future work, but unblocks work on I don't see thiis PR as creating technical debt or otherwise slowing us down later. I'd agree that would be something to avoid, if I missed such a thing here.
Probably @tlively can explain that better as I don't fully understand how our feature section handling works. However, I'd like to leave that as a separate question from this PR, if that's ok? If we decide to change the approach (and check the feature before using the types, which means changing how we use the feature section) that would not be hard to do later, but would be a larger discussion on changing how the feature section works (which I'd rather not block GC work on).
It does seem like it shouldn't work 😉 but it does, I think. If TFR is disabled then a |
Maybe a shorter way to state my position: I don't consider this PR to be "correct" in terms of nullability. I'm suggesting that we don't care about nullability for now, and leave it for later. The benefit is that that can save us work given the uncertainties here. But I sense that you want there to be a proper mental model for nullability even in this PR? If so, the options are to either remove it entirely, or something else that would be more work either now or later, I'm pretty sure. So I hope we can agree to having "wrong" nullability for now, |
It's definitely not the way I would have written it, but I see where you're coming from :) I know you're not about to stop work on this and leave nullability half-done, so I'm fine with landing it this way. I'll give it one last pass through to make sure I don't have any other concerns. |
Then is it OK make it a non-nullable type in the implementation? I agree we should start with non-nullable types, but I'm just wondering if this is spec-compliant. For example, this does not seem to be valid in the spec, but we might treat this valid..? (ref.as_non_null (ref.func $foo)) I guess it doesn't matter now because we don't have
So you mean, even though we don't know the feature set while reading a module (I asked a question above that below), we know the feature set when we write back the module to either text of binary, so we can emit the write types that conform to the current feature set? That makes sense, thanks. @kripken and @tlively |
No, because the feature set is determined in part by the target features section. The only way to read the target features section right now is to load the module. |
So we parse and read code before reading the features section? Is that because the features section occurs after the code section? |
(I think you wrote "non-nullable" in this message where you meant "nullable". Is that correct?) Since non-nullable references are subtypes of their corresponding nullable references, they can be used anywhere the nullable references can be used. That means that treating instructions that produce non-nullable references as though they produced nullable references only lets them be used in locations where they could already be used. It therefore does not allow any invalid modules to validate.
This is valid in the spec because |
Yes, that's correct. In theory we could skip ahead and read the target features section first, but that wouldn't buy us much because using the more refined types internally works out fine. |
(I do suspect this is not spec-compliant for some reason or other, detectable in a spec test, but this is wip - we'll fix those issues later.) |
This adds the new feature and starts to use the new types where relevant. We
use them even without the feature being enabled, as we don't know the features
during wasm loading - but the hope is that given the type is a subtype, it should
all work out. In practice, if you print out the internal type you may see a typed
function reference-specific type for a
ref.func
for example, instead of a genericfuncref
, but it should not affect anything else.This PR does not support non-nullable types, that is, everything is nullable
for now. As suggested by @tlively this is simpler for now and leaves nullability
for later work (which will apparently require
let
or something else, and manypasses may need to be changed).
To allow this PR to work, we need to provide a type on creating a
RefFunc
. Thewasm-builder.h
internal API is updated for this, as are the C and JS APIs,which are breaking changes. cc @dcodeIO
We must also write and read function types properly. This PR improves
collectSignatures
to find all the types, and also to sort them by thedependencies between them (as we can't emit X in the binary if it depends
on Y, and Y has not been emitted - we need to give Y's index). This sorting
ends up changing a few test outputs.
InstrumentLocals support for printing function types that are not
funcref
is disabled for now, until we figure out how to make that work and/or
decide if it's important enough to work on.
The fuzzer has various fixes to emit valid types for things (mostly
whitespace there). Also two drive-by fixes to call
makeTrivial
where itshould be (when we fail to create a specific node, we can't just try to make
another node, in theory it could infinitely recurse).
Binary writing changes here to replace calls to a standalone function to
write out a type with one that is called on the binary writer object itself,
which maintains a mapping of type indexes (
getFunctionSignatureByIndex
).This looks ok after fuzzing.