Skip to content

Extend the C- and JS-APIs #2586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 22, 2020
Merged

Extend the C- and JS-APIs #2586

merged 26 commits into from
Jul 22, 2020

Conversation

dcodeIO
Copy link
Contributor

@dcodeIO dcodeIO commented Jan 12, 2020

Renames the following C-API functions

  • BinaryenBlockGetChild to BinaryenBlockGetChildAt
  • BinaryenSwitchGetName to BinaryenSwitchGetNameAt
  • BinaryenCallGetOperand to BinaryenCallGetOperandAt
  • BinaryenCallIndirectGetOperand to BinaryenCallIndirectGetOperandAt
  • BinaryenHostGetOperand to BinaryenHostGetOperandAt
  • BinaryenThrowGetOperand to BinaryenThrowGetOperandAt
  • BinaryenTupleMakeGetOperand to BinaryenTupleMakeGetOperandAt

Adds the following C-API functions

  • BinaryenExpressionSetType
  • BinaryenExpressionFinalize
  • BinaryenBlockSetName
  • BinaryenBlockSetChildAt
  • BinaryenBlockAppendChild
  • BinaryenBlockInsertChildAt
  • BinaryenBlockRemoveChildAt
  • BinaryenIfSetCondition
  • BinaryenIfSetIfTrue
  • BinaryenIfSetIfFalse
  • BinaryenLoopSetName
  • BinaryenLoopSetBody
  • BinaryenBreakSetName
  • BinaryenBreakSetCondition
  • BinaryenBreakSetValue
  • BinaryenSwitchSetNameAt
  • BinaryenSwitchAppendName
  • BinaryenSwitchInsertNameAt
  • BinaryenSwitchRemoveNameAt
  • BinaryenSwitchSetDefaultName
  • BinaryenSwitchSetCondition
  • BinaryenSwitchSetValue
  • BinaryenCallSetTarget
  • BinaryenCallSetOperandAt
  • BinaryenCallAppendOperand
  • BinaryenCallInsertOperandAt
  • BinaryenCallRemoveOperandAt
  • BinaryenCallSetReturn
  • BinaryenCallIndirectSetTarget
  • BinaryenCallIndirectSetOperandAt
  • BinaryenCallIndirectAppendOperand
  • BinaryenCallIndirectInsertOperandAt
  • BinaryenCallIndirectRemoveOperandAt
  • BinaryenCallIndirectSetReturn
  • BinaryenCallIndirectGetParams
  • BinaryenCallIndirectSetParams
  • BinaryenCallIndirectGetResults
  • BinaryenCallIndirectSetResults
  • BinaryenLocalGetSetIndex
  • BinaryenLocalSetSetIndex
  • BinaryenLocalSetSetValue
  • BinaryenGlobalGetSetName
  • BinaryenGlobalSetSetName
  • BinaryenGlobalSetSetValue
  • BinaryenHostSetOp
  • BinaryenHostSetNameOperand
  • BinaryenHostSetOperandAt
  • BinaryenHostAppendOperand
  • BinaryenHostInsertOperandAt
  • BinaryenHostRemoveOperandAt
  • BinaryenLoadSetAtomic
  • BinaryenLoadSetSigned
  • BinaryenLoadSetOffset
  • BinaryenLoadSetBytes
  • BinaryenLoadSetAlign
  • BinaryenLoadSetPtr
  • BinaryenStoreSetAtomic
  • BinaryenStoreSetBytes
  • BinaryenStoreSetOffset
  • BinaryenStoreSetAlign
  • BinaryenStoreSetPtr
  • BinaryenStoreSetValue
  • BinaryenStoreGetValueType
  • BinaryenStoreSetValueType
  • BinaryenConstSetValueI32
  • BinaryenConstSetValueI64
  • BinaryenConstSetValueI64Low
  • BinaryenConstSetValueI64High
  • BinaryenConstSetValueF32
  • BinaryenConstSetValueF64
  • BinaryenConstSetValueV128
  • BinaryenUnarySetOp
  • BinaryenUnarySetValue
  • BinaryenBinarySetOp
  • BinaryenBinarySetLeft
  • BinaryenBinarySetRight
  • BinaryenSelectSetIfTrue
  • BinaryenSelectSetIfFalse
  • BinaryenSelectSetCondition
  • BinaryenDropSetValue
  • BinaryenReturnSetValue
  • BinaryenAtomicRMWSetOp
  • BinaryenAtomicRMWSetBytes
  • BinaryenAtomicRMWSetOffset
  • BinaryenAtomicRMWSetPtr
  • BinaryenAtomicRMWSetValue
  • BinaryenAtomicCmpxchgSetBytes
  • BinaryenAtomicCmpxchgSetOffset
  • BinaryenAtomicCmpxchgSetPtr
  • BinaryenAtomicCmpxchgSetExpected
  • BinaryenAtomicCmpxchgSetReplacement
  • BinaryenAtomicWaitSetPtr
  • BinaryenAtomicWaitSetExpected
  • BinaryenAtomicWaitSetTimeout
  • BinaryenAtomicWaitSetExpectedType
  • BinaryenAtomicNotifySetPtr
  • BinaryenAtomicNotifySetNotifyCount
  • BinaryenAtomicFenceSetOrder
  • BinaryenSIMDExtractSetOp
  • BinaryenSIMDExtractSetVec
  • BinaryenSIMDExtractSetIndex
  • BinaryenSIMDReplaceSetOp
  • BinaryenSIMDReplaceSetVec
  • BinaryenSIMDReplaceSetIndex
  • BinaryenSIMDReplaceSetValue
  • BinaryenSIMDShuffleSetLeft
  • BinaryenSIMDShuffleSetRight
  • BinaryenSIMDShuffleSetMask
  • BinaryenSIMDTernarySetOp
  • BinaryenSIMDTernarySetA
  • BinaryenSIMDTernarySetB
  • BinaryenSIMDTernarySetC
  • BinaryenSIMDShiftSetOp
  • BinaryenSIMDShiftSetVec
  • BinaryenSIMDShiftSetShift
  • BinaryenSIMDLoadSetOp
  • BinaryenSIMDLoadSetOffset
  • BinaryenSIMDLoadSetAlign
  • BinaryenSIMDLoadSetPtr
  • BinaryenMemoryInitSetSegment
  • BinaryenMemoryInitSetDest
  • BinaryenMemoryInitSetOffset
  • BinaryenMemoryInitSetSize
  • BinaryenDataDropSetSegment
  • BinaryenMemoryCopySetDest
  • BinaryenMemoryCopySetSource
  • BinaryenMemoryCopySetSize
  • BinaryenMemoryFillSetDest
  • BinaryenMemoryFillSetValue
  • BinaryenMemoryFillSetSize
  • BinaryenRefIsNullSetValue
  • BinaryenRefFuncSetFunc
  • BinaryenTrySetBody
  • BinaryenTrySetCatchBody
  • BinaryenThrowSetEvent
  • BinaryenThrowSetOperandAt
  • BinaryenThrowAppendOperand
  • BinaryenThrowInsertOperandAt
  • BinaryenThrowRemoveOperandAt
  • BinaryenRethrowSetExnref
  • BinaryenBrOnExnSetEvent
  • BinaryenBrOnExnSetName
  • BinaryenBrOnExnSetExnref
  • BinaryenTupleMakeSetOperandAt
  • BinaryenTupleMakeAppendOperand
  • BinaryenTupleMakeInsertOperandAt
  • BinaryenTupleMakeRemoveOperandAt
  • BinaryenTupleExtractSetTuple
  • BinaryenTupleExtractSetIndex
  • BinaryenFunctionSetBody

Also introduces wrappers to the JS-API resembling the classes in C++ to perform the above operations on an expression. For example:

var unary = binaryen.Unary(module.i32.eqz(1));
unary.getOp(...) / .op
unary.setOp(...) / .op = ...
unary.getValue(...) / .value
unary.setValue(...) / .value = ...
unary.getType(...) / .type
unary.finalize()
...

Usage of wrappers is optional, and one can also use plain functions:

var unary = module.i32.eqz(1);
binaryen.Unary.getOp(unary, ...)
...

Also adds comments to all affected functions in case we'd like to generate API documentation at some point.


Original post:

This is a proposal to extend the C- and JS-APIs with all sorts of setters and utility, using blocks as an example how that might look like. In this particular instance, I've added setters for the various block properties incl. some convenience utility to make working with its children easy, and created a Block JS wrapper exposing the underlying C-API to JS in a nicer way, as if working with an actual object.

This is quite a chunk of work, so perhaps a couple questions to have a good start on this:

  • Is it dangerous in some way to replace expressions in the IR like this? Like, is there a reason that there are no setters yet?
  • Would it be ok if I go ahead and unify the C-API function names? For example, we currently have BinaryenBlockRemoveChild taking an index argument, but one day there might also be another helper taking an expression argument that would ideally be named BinaryenBlockRemoveChild, and avoiding that right away by appending an ...At postfix to everything taking an index appears to make sense to me.
  • What are your thoughts on returning wrapper objects like Block right away when creating expressions, so one gets a JS object with all the utility? Is the convenience worth it? One interesting aspect is that we can implement #valueOf methods on these so these are implicitly convertible to their primitive pointer value.
  • Would you prefer methods like Block#getName or getter/setter properties like Block#name. Currently doing the latter.

@kripken
Copy link
Member

kripken commented Jan 13, 2020

I think we could automate a lot of this, if we get to #2414 which suggests generating the core IR node code. That could generate JS bindings automatically too. I really hope we can get to that soon.

Is it dangerous in some way to replace expressions in the IR like this? Like, is there a reason that there are no setters yet?

Setting fields of IR nodes like this should be safe. The one possibly tricky thing is that they might change the type, so ->finalize() should be called.

Would it be ok if I go ahead and unify the C-API function names? For example, we currently have BinaryenBlockRemoveChild taking an index argument, but one day there might also be another helper taking an expression argument that would ideally be named BinaryenBlockRemoveChild, and avoiding that right away by appending an ...At postfix to everything taking an index appears to make sense to me.

Hmm, perhaps that example could be handled using overloading in JS? Or do you mean in the C API? Yeah, for C that sounds good to me.

What are your thoughts on returning wrapper objects like Block right away when creating expressions, so one gets a JS object with all the utility?

I think we should consider the overhead of doing that. It's more lightweight as we currently do things, but we didn't measure it. I suspect in very large modules though this can matter.

cc @tlively @aheejin who've done more work on these APIs than me recently.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jan 13, 2020

Hmm, perhaps that example could be handled using overloading in JS? Or do you mean in the C API?

Mostly in the C-API, yeah, so these APIs won't conflict in the future. JS API is unaffected so far.

I think we could automate a lot of this

Dang, too sad I'm so far in already. I guess like 90% of this can be automated, with some APIs that need special handling.

@tlively
Copy link
Member

tlively commented Jan 13, 2020

This is a proposal to extend the C- and JS-APIs with all sorts of setters and utility, using blocks as an example how that might look like. In this particular instance, I've added setters for the various block properties incl. some convenience utility to make working with its children easy, and created a Block JS wrapper exposing the underlying C-API to JS in a nicer way, as if working with an actual object.

Nice, this will be very helpful to have for future work 👍

  • Would it be ok if I go ahead and unify the C-API function names? For example, we currently have BinaryenBlockRemoveChild taking an index argument, but one day there might also be another helper taking an expression argument that would ideally be named BinaryenBlockRemoveChild, and avoiding that right away by appending an ...At postfix to everything taking an index appears to make sense to me.

This SGTM. We haven't been too careful about API compatibility and I believe AS is our biggest API consumer, so I'm ok with changes that you're on board with.

  • What are your thoughts on returning wrapper objects like Block right away when creating expressions, so one gets a JS object with all the utility? Is the convenience worth it? One interesting aspect is that we can implement #valueOf methods on these so these are implicitly convertible to their primitive pointer value.

I'm not exactly sure what returning wrapper objects immediately and using them would look like. Can you provide an example? I don't know anything about #valueOf, either, so I don't know how that comes into it.

  • Would you prefer methods like Block#getName or getter/setter properties like Block#name. Currently doing the latter.

getters/setters sound good to me, too.

@aheejin
Copy link
Member

aheejin commented Jan 14, 2020

Mostly agree with @tlively and @kripken. Not sure what you mean by returning Block directly and not sure what the usage of valueOf will be. Otherwise all SGTM.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jan 14, 2020

I'm not exactly sure what returning wrapper objects immediately and using them would look like. Can you provide an example?

Sure, for example one currently does

var block = module.block("theName", [ module.i32.const(1) ]);

returning a numeric expression reference, but if it would return a Block wrapper object that provides methods to work with the block, something like this would become possible:

var block = module.block(); // allow omitting args
block.name = "theName";
block.appendChild(module.i32.const(1));

Now, when the block object is provided to another API method that expects a numeric expression reference, like in

...
var otherBlock = module.block();
otherBlock.appendChild(block); // here

appendChild will ultimately call BinaryenBlockAppendChild, a Wasm export, thus will attempt to convert the wrapper object to a primitive value, which we can hijack by implementing Block#valueOf so the implicit conversion will yield the numeric expression reference. This would even make such a change relatively backwards compatible. Has some overhead, though, as every wrapper is a closure with one captured var.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jan 14, 2020

Decided to go on with this nonetheless since we could need the setters rather sooner than later. Will for example allow us to have a pass on our end that replaces standard library calls like Math.pow(x, 2) with more efficient code after other optimizations.

I have no strong feelings on the JS API actually since we are not consuming it, so we might as well postpone the JS part for further investigation and consider this an extension of the C-API only for now.

@kripken
Copy link
Member

kripken commented Jan 15, 2020

Thanks for the more detailed description @dcodeIO. My feeling is that returning wrapper objects is not worth the change - it adds overhead compared to the current approach (which in JS just returns an integer, instead of allocating an object etc.) on each IR node creation.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jan 15, 2020

Alright, updated the approach to provide static functions in the form

Block.getName(expr)
Block.setName(expr, name)
Block.getNumChildren(expr)
Block.getChildAt(expr, index)
...
Block.appendChild(expr, childExpr)

plus some JS magic to automatically populate wrapper classes (here: Block) from these functions, i.e.

Block#getName() + Block#name getter
Block#setName(name) + Block#name setter
Block#getNumChildren() + Block#numChildren getter
Block#getChildAt(index)
...
Block#appendChild(childExpr)

with the latter being optional. So, if one so fancies, this is possible:

var block = Block(module.block(null, []));
block.name = "theName";
block.appendChild(...);
...

Still not sure about the JS magic, though, since the usefulness is somewhat limited, but I guess it also does no harm to have it as an option? Hmm...

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jan 16, 2020

This should be mostly complete API-wise I think, but I still have to figure out when it is necessary to call finalize() and create tests for all the expressions. Regarding finalize I wonder what the impact of doing this implicitly would be, respectively whether it would be better to expose a BinaryenExpressionFinalize instead so this can be done once when a generator is done modifying an expression? Unless that's dangerous ofc if it would break the validator or something if missed.

@tlively
Copy link
Member

tlively commented Jan 17, 2020

Regarding finalize I wonder what the impact of doing this implicitly would be, respectively whether it would be better to expose a BinaryenExpressionFinalize instead so this can be done once when a generator is done modifying an expression

finalize does not walk all the subexpressions, so it should be fine to call whenever a child might change types.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jan 18, 2020

Agree that this isn't a problem for most expressions, yet I'm concerned about Block::finalize. If I'm not mistaken, this one might, in the worst case, traverse the block's children three times per operation (search for unreachable, TypeSeeker, handleUnreachable).

var expr = module.block(null, []);
Block.appendChild(expr, child1); // implicit finalize
...
Block.appendChild(expr, childN); // implicit finalize

May either make an exception for blocks, having to manually finalize these only, or make it a convention to call finalize manually when done modifying any expression. Currently implemented the latter.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jul 15, 2020

This is mostly complete now, with tests so far covering what's in the MVP. Updated the opening post accordingly 🙂

Wdyt?

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jul 17, 2020

Just completed the remaining tests. For a bit of background: The incentive here is that we'd like to have two directional bindings so we can modify Binaryen IR on our end, which will be useful in our upcoming implementation of closures (for example replacing local accesses with loads and stores after the fact). Being able to modify ASTs should also allow everyone to write their own passes (in other languages) without merging them into Binaryen itself, at least to some degree where more advanced utility like access to CFG or similar is not needed.

Having done all of this by hand now, I'm still not exactly sure how much of this should have been done using a generator, mostly because some things are special and being able to add to comments seems to have its benefits. With all this stuff in place now, it at least appears trivial to copy and paste a few things when adding a new instruction, while also being able to deal with special cases directly in code instead of extending the generator each time one of these shows up.

Let me know what you think :)

@dcodeIO dcodeIO marked this pull request as ready for review July 17, 2020 01:38
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I think this is a good approach. The one concern I have is that I still think this should be automated, but I don't want to block this on that. And it's nice to have this landed so if/when we get automatic bindings we can verify them with these tests.

@tlively @aheejin did you want to take another look too?

@kripken
Copy link
Member

kripken commented Jul 22, 2020

Ok, I think we can land this (and iterate later if there are comments). @dcodeIO is the first comment a good summary for the commit message? Or something else?

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Jul 22, 2020

Yeah, the first comment until the


@kripken kripken merged commit d406654 into WebAssembly:master Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants