Skip to content

Decouple UI from Code Generation #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 6, 2024
Merged

Decouple UI from Code Generation #147

merged 17 commits into from
Sep 6, 2024

Conversation

wnbaum
Copy link
Contributor

@wnbaum wnbaum commented Jul 20, 2024

Decouples UI from code generation. On top of #190.

So far:

  1. Load block resources from .tres files
  2. Replace category factory block creation with simple block resources with same properties
  3. Load blocks by name from block script data. Make them into an AST. Then create the UI from the AST.
  4. Create updated AST from UI when modified. Uses resources stored in UI.
  5. Save blocks by name from AST into block script data.
  6. Serialize user input (defaults and user input actually serialized as Variants)
  7. Merge ASTs if same entry statement
  8. Fix option inputs
  9. Fix variable blocks being loaded from name (need to load custom resources from BSD before loading script)
  10. Add old blocks back
  11. Remake pong game
  12. Fix scope
  13. Fix block highlighting
  14. Fix undo

To do:

  1. Replace my temporary block resource with Manuel's (for overloading). Make sure things like control blocks with multiple child snaps work. (Lets save this for another PR where we can modify the block resource for overloading).
  2. Reimplement unique local variables (Can also be another PR)
  3. Clean up

https://phabricator.endlessm.com/T35591

@wnbaum wnbaum force-pushed the decouple-ui-from-codegen branch from 8ac71a0 to e5f9770 Compare July 23, 2024 00:53
@wnbaum
Copy link
Contributor Author

wnbaum commented Jul 23, 2024

Did some more work on this. If someone does want to go through the code you can but it's still pretty messy. You can now make block scripts again, and they save and load just fine. The old blocks are not ported yet though.

The idea of this is to have one central data structure, BlockAST that generates the code and stores the structure of block resources. The UI is generated from the AST, and the serialized scripts get saved from the AST. The AST is loaded from the serialized script as well. The serialized script is in the form of a "BlockNameTree" which is what it sounds like, a tree of named blocks (along with raw user input arguments) which should be the minimum amount of data we need to save for a block script.

So a little diagram would look like this:

UI <-> BlockAST <-> BlockNameTree
          |
          V
    Generated Code

@wnbaum wnbaum force-pushed the decouple-ui-from-codegen branch 2 times, most recently from b214736 to 6bea874 Compare July 24, 2024 01:26
@wnbaum
Copy link
Contributor Author

wnbaum commented Jul 24, 2024

Sorry if I pinged everyone, I accidentally rebased 40 commits back... yikes! git reflog to the rescue :)

@wnbaum wnbaum force-pushed the decouple-ui-from-codegen branch 5 times, most recently from 03d6930 to b16d57f Compare July 29, 2024 21:22
@dbnicholson
Copy link
Member

I want to make sure I understand the data model correctly. It seems like BlockNameTreeNode is a minimized version of BlockAST.ASTNode. The point of BlockNameTreeNode is to not serialize individual BlockResources (which are children of ASTNode) to the scene. In other words, ASTNode is a concrete instance of BlockNameTreeNode + BlockResource. Is that correct?

The relationship between BlockAST and BlockScriptData is a bit confusing. It feels like BlockScriptData is the AST root node, but I guess it's more like a wrapper around the various top level ASTs. Is that correct?

Am I wrong or is InstructionTree unused after this?

I have to admit that the names are slowing me down. Whenever I see some object that begins with Block, I have to dig around to figure out if it's a UI element, serialization element or code element.

Anyways, now that I've dug through most of this, I mostly like it. You can mostly see how the GDScript can be built from objects independently of the UI. I like that the generated script clearly comes out of the AST state rather than the UI state or the serialized state. The downside is that BlockCanvas is doing a lot of heavy lifting transitioning between the 3 states.

@dbnicholson
Copy link
Member

I see that all the SerializedBlock* classes are gone, too. One thing that I think would have made this easier to review is if you either used the existing classes or renamed them and then made changes.

@wnbaum
Copy link
Contributor Author

wnbaum commented Jul 29, 2024

I want to make sure I understand the data model correctly. It seems like BlockNameTreeNode is a minimized version of BlockAST.ASTNode. The point of BlockNameTreeNode is to not serialize individual BlockResources (which are children of ASTNode) to the scene. In other words, ASTNode is a concrete instance of BlockNameTreeNode + BlockResource. Is that correct?

ASTNode stores a BlockResource in its data field, and its children are other ASTNodes. BlockNameTreeNode is also a tree node class, but stores the name of the block as its data, and its children are other BlockNameTreeNodes. These BlockNameTreeNodes are what get saved directly to the BlockScriptData. When a BlockScriptData is loaded, an ASTNode is created for each corresponding BlockNameTreeNode. These loaded ASTNodes will now contain the actual BlockResource that the BlockNameTreeNode's block_name is referencing.

So again, an ASTNode is just a tree node that contains the BlockResource. The BlockResource is what defines the actual code that gets generated in the BlockAST later. These ASTNodes are created when we match the BlockNameTreeNode.block_name field to its corresponding BlockResource.

The relationship between BlockAST and BlockScriptData is a bit confusing. It feels like BlockScriptData is the AST root node, but I guess it's more like a wrapper around the various top level ASTs. Is that correct?

Correct. Just like we stored the SerializedBlockTrees as a field in BlockScriptData before, we now store BlockNameTrees. These BlockNameTrees are composed of BlockNameTreeNodes that store the bare minimum data they need to: the block name and whatever user input is specified. There is a BlockNameTree for each entry block, and they each have a position on the canvas.

Am I wrong or is InstructionTree unused after this?

Correct. I will remove it. The BlockAST does similar things for code generation.

I have to admit that the names are slowing me down. Whenever I see some object that begins with Block, I have to dig around to figure out if it's a UI element, serialization element or code element.

I definitely think some better class names would be helpful. I'll do this before I take this PR out of draft.

Anyways, now that I've dug through most of this, I mostly like it. You can mostly see how the GDScript can be built from objects independently of the UI. I like that the generated script clearly comes out of the AST state rather than the UI state or the serialized state. The downside is that BlockCanvas is doing a lot of heavy lifting transitioning between the 3 states.

Yes, there's certainly a lot of nice things this accomplishes! I agree that the BlockCanvas is doing too much. I can surely move some tree construction methods to their respective tree classes. I do think having these 3 states is nice though because it is more flexible for code changes in the future. Suppose we want to change the BlockNameTreeNode serialization to some sort of markup type language. Then we wouldn't even have to touch the UI or code generation, only the interface between the serialization and the BlockAST.

@wnbaum
Copy link
Contributor Author

wnbaum commented Jul 30, 2024

I see that all the SerializedBlock* classes are gone, too. One thing that I think would have made this easier to review is if you either used the existing classes or renamed them and then made changes.

Ah, very true. I didn't think about that being a way that would point you in the right direction when reviewing. Good to know! We should definitely discuss these decoupling changes tomorrow and either this will have been a good experiment or something I can clean up and merge :) Thanks for checking it out @dbnicholson!

@fire

This comment was marked as off-topic.

@wnbaum wnbaum force-pushed the decouple-ui-from-codegen branch from b16d57f to 001246c Compare July 31, 2024 21:32
@wnbaum

This comment was marked as off-topic.

@wnbaum wnbaum force-pushed the decouple-ui-from-codegen branch 6 times, most recently from 1a9e6c9 to 8002b87 Compare August 5, 2024 22:36
@wnbaum
Copy link
Contributor Author

wnbaum commented Aug 5, 2024

Rebased on top of Manuel's most recent changes! Working without errors - though I still have to port some stuff over to BlockCatalog, reimplement the focus highlighting, and just clean up in general. Will split into comprehensible commit messages as well.

Comment on lines 10 to 11
display_template = "Else if {condition: BOOL}"
code_template = "elif {condition}:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wnbaum I wonder why you decided to make "else" or "else if" separate blocks.
Captura desde 2024-08-06 07-51-51

This can create scripts with errors:
Captura desde 2024-08-06 08-05-54

Is it because blocks with multiple slots are hard in this new architecture? In that case it is less flexible than what we had, going against the flexibility goal.

IMO the beauty of blocks is that they prevent parse and syntax errors.

Copy link
Contributor Author

@wnbaum wnbaum Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the main reason was ease of implementation. We could accomplish this with different BlockDefinition types, like StatementBlockDefinition and ControlBlockDefinition as well as a more complex abstract syntax tree. As you can see in the BlockDefinition class we have a block_type and a variant_type which lets us combine all the current block types into one data model. This is nice for simplicity, but only allows one statement for control blocks (unless we were to generalize the model even more).

Also, the AST is currently made up of nodes with children. Each entry block has children, and each control block has children, and the children are the sequence of blocks that are indented under the current statement. We would need to expand the AST, BlockNameTree, and UI generation functions to take into account multiple children groups per node. This wasn't an issue before since we just saved the paths to each block, and generated the code through the UI, which IMO was very ugly.

So it's definitely possible to do undo this regression, but it would add a lot more to this PR. Also, there are some benefits to having the control blocks being like this. If you wanted to convert an if/else statement to an if in the old version, you would have to take all the blocks out, switch out the control block, then put all the blocks back in. Whereas with this you can just chuck the else block. The only thing you would have to do to guarantee correct scripts is make sure else and elif blocks can only be attached to if and elif blocks.

Anyways, there's definitely arguments for both UI methods, so I thought it would be best to do the simplest implementation and then decide if it is really worth porting back later. IIRC @dbnicholson had similar thoughts when we talked about it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this today. I agree with @dylanmccall and @wjt that we should make the block definition support 0 to N slots of children blocks. I'm fine to leave this for later too. Please mention this important behavior change in the commit message.

@manuq
Copy link
Contributor

manuq commented Aug 6, 2024

Rebased on top of Manuel's most recent changes! Working without errors - though I still have to port some stuff over to BlockCatalog, reimplement the focus highlighting, and just clean up in general. Will split into comprehensible commit messages as well.

That was a fast rebase @wnbaum, cool!

I gave it a try and everything works without errors.

I see all the resource IDs are changing when I change a single number argument:
Captura desde 2024-08-06 08-11-28

Do you think that regression can be fixed?

@wnbaum
Copy link
Contributor Author

wnbaum commented Aug 6, 2024

I see all the resource IDs are changing when I change a single number argument:

Do you think that regression can be fixed?

Yes! The way the serialized tree is reconstructed, it creates all new resources. This logic is a little different from yours I think because now the BlockDefinition is being stored in the UI rather than the BlockSerialization. That's why the logic didn't transfer over when I rebased. Anyways, you are correct that this is a regression and I will just check if the resource already exists by traversing the old serialized tree while creating the new one, and that should yield similar results.

@wnbaum wnbaum force-pushed the decouple-ui-from-codegen branch from 8002b87 to f4c82ed Compare August 6, 2024 15:54
@wnbaum
Copy link
Contributor Author

wnbaum commented Aug 6, 2024

Actually @manuq I'm thinking it might be better to wait and have the changes for keeping the resource IDs the same live in a separate PR. There's many ways to do this as I've found out today:

  1. My original idea was to traverse the old block name tree while creating the new block name tree, and reuse resources with same name. (Hard in this version because we need to preserve the data in the old tree to explore it, while overriding its properties to use in the new tree).
  2. Override _get_rid() with a hash of each block's tree position. This way RIDs are consistent. (Easier to implement but may not reuse some resources if positions are relocated).
  3. Use our own custom serialization that stores block names and parameters in a parseable and readable way (which I think I prefer as it will make diffs much more readable. After all, we are serializing code and code diffs are what we and git are used to).

The root of the problem is that there is no easy way to get Godot to consistently serialize resources in the exact same position and with the exact same IDs (other than actually storing them in the structure of the UI like main is doing currently, not decoupled).

It really comes down to how we want to serialize the AST. This PR supports flexibility in that we can choose how to generate the UI from the AST, or how we want to serialize it since the three states are "decoupled" from each other. Do you think it's worth it to try to implement one of those three methods I described now? Or just consider it later.

@dbnicholson dbnicholson force-pushed the decouple-ui-from-codegen branch from af51d0c to 3b395e6 Compare September 3, 2024 23:12
@dbnicholson
Copy link
Member

I rebased this on aa80f0c. I've pushed the 2 previous versions of this branch as decouple-ui-from-codegen-v1 and decouple-ui-from-codegen-v2. decouple-ui-from-codegen-v2 is the last branch that @wnbaum pushed.

Here's a high level view of the changes:

  • Some things got messed up in the rebase commits. Particularly with all the movement away from CategoryFactory to BlocksCatalog. I added some more commits that need a bit of cleanup to handle that.
  • There were 2 caches in BlockCanvas that I removed. One of them was a BlockDefinition cache that duplicated what was in BlocksCatalog. The other was a BlockCategory cache that's mostly unneeded. However, I think it did handle a custom block class providing get_custom_categories. That appears to be broken in main and I didn't try to fix it here. I think that needs the same treatment as in 768de50 but with a category cache in CategoryFactory.
  • Serialization of node arguments is broken. For example, in pong there's an on_body_entered block with a subsequent body.is_in_group('something') test in a conditional. The body is being lost and I don't really understand how that works. I'll keep poking at that tomorrow.

dbnicholson and others added 16 commits September 5, 2024 13:22
Normally a Variant of type String is quoted for code generation. That
breaks in some scenarios such as defining a method where the string
needs to be inserted verbatim. For that, use StringName under the
assumption that we won't use interned strings to represent quoted
strings later.
Automatically create parameter output block definitions based on any
entry blocks that use the `[param: TYPE]` bracket template. These will
be used later when converting UI blocks to be based on BlockDefinition.
These are being used in all states, and the common location is code
generation. While here, rename VariableResource to VariableDefinition to
better match BlockDefinition. Similarly, remove both from the global
class registry.
These classes will be used as standalone structures to generate code.
They will be used as an intermediate state between the UI and
serialization states.
These will be used to populate the picker rather than doing it in the UI
layer.
These will be used when the picker and category factory transition to
BlockDefinitions. After that, all BlockDefinition creation will happen
in code generation.
These will be used when converting to and from the code generation
classes so that you can go directly from the serialization to code.
Load block scripts by creating BlockASTs from BlockNameTrees, and then generate UI Blocks from them.

Save block scripts by getting the BlockASTs from the UI Blocks, and then saving them as BlockNameTrees.
Move the instruction tree tests to code generation tests.
@dbnicholson
Copy link
Member

Finally really ready for review again. It's definitely broken in the intermediate commits, but I added a few commits to smooth it out. It feels pretty working to me.

Copy link
Contributor

@manuq manuq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa that was a lot, thanks so much for taking care of this. I gave it a test and everything seems to work as before.

The Pong demo wasn't working because of a parse error with quoted function names: func 'goal_right'():, not verbatim. So I have appended a fixup commit. I only had to interact with the canvas without doing any change so the code was regenerated property.

I am a bit afraid that this drops all the efforts of having a small diff in the saved scene. Changing a single number in the canvas changes all the resources IDs. The effort was not only for the diff, reusing the existing resources was good for efficiency. I was hoping to keep that and "only" use the new AST for proper code generation. But I trust you that the other improvements are worth this regression.

description = "Define a method/function with following statements"
category = "Communication | Methods"
type = 1
variant_type = 0
display_template = "Define method {method_name: NIL}"
display_template = "Define method {method_name: STRING_NAME}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying what was the "__nil__" prefix about in the original PR and for providing a better workaround. Still seems like a workaround for verbatim input. And I can see how we can benefit from StringName in the short term. Like the work @dylanmccall is doing for having a drop-down of current animation names (which are StringNames) in AnimationPlayer. But we can use Strings in there for now at the expense of optimization, in order to have this big PR merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried TYPE_CALLABLE since that's closer to what we're actually doing in this case. You can use a dummy callable like Callable.create(null, 'my_method') and later get the name out with callable.get_method(). Unfortunately, godot only supports serializing an empty Callable(), so the name would be lost in the serialization.

I tried hard to think of different alternatives, but I couldn't think of anything if we are completely in the world that inputs are encoded as Variant and only that is used to determine the type.

Copy link
Contributor

@dylanmccall dylanmccall Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read through the commits here so sorry if I'm missing details, but I wonder if the thing we need in the end here is really more in the code template end of things? We need a way to say "use this parameter as written, but don't quote it or coerce it into anything new". Could communicate that like {{method_name}} instead of {method_name} in the code template, with nothing special in the display template.

I didn't touch that, but I was poking around here in #206 by removing the pretend OPTION parameter type and (as a result) using NIL as a type wherever we had used OPTION.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem isn't so much in the template side. We can put whatever we want there and add something special. The problem comes in the generation side because everything is introspected based on the variant type. Even right now where this all happens in ParameterInput.get_string, everything happens based on variant_type except for the oddball case for OPTION. Actually, that part looks pretty much the same. The only reason define_method worked before this is that EntryBlock is treated specially, and get_entry_statement doesn't go through the normal substitution path.

I guess a better way I hadn't thought of before is to use a dedicated class and then special case it within TYPE_OBJECT like OptionData is. E.g., FunctionParameter or something like that.

@@ -94,6 +94,40 @@ static func _setup_definitions_from_files():
_by_class_name[target][block_definition.name] = block_definition


static func _add_output_definitions(definitions: Array[BlockDefinition]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly this is adding a block definition for each argument in the display_template of each entry block. Eg. in &"define_method" block it has display_template = "method_name: STRING_NAME}" so it will add a value block to the catalog with name &"define_method_method_name" and variant type StringName and scope "func {method_name}():".

If so, can we call this _add_argument_definitions() instead of "output"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually for the Area2D and RigidBody2D on_{entered,exited} blocks, too. I called them output definitions because they're associated to what the code currently calls ParameterOutput. Those are encoded in display_template as [param: TYPE]. On the other hand, ParameterInput is used when the template has {param: TYPE}.

Technically, this is the difference between a parameter and an argument. What we're trying to handle here are parameters - the things in the function declaration. What we normally use are arguments, the things used in a function call.

So, in the short term it would probably be better to call this _add_parameter_definitions and ignore the confusion between ParameterInput and ParameterOutput. In the longer term it would be better to change ParameterInput to Argument and ParameterOutput to Parameter. But I suspect there's a lot of loose "parameter" usage within the code.

extends Resource

@export var name: StringName
@export var arguments: Dictionary # String, ValueBlockSerialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: typed dictionaries is coming to Godot 4.4, yay.

extends Resource

@export var name: StringName
@export var arguments: Dictionary # String, ValueBlockSerialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: typed dictionaries is coming to Godot 4.4, yay.

Comment on lines +16 to +19
# FIXME Note: This used to be a NodePath. There is a bug in Godot 4.2 that causes the
# reference to not be set properly when the node is duplicated. Since we don't
# use the Node duplicate function anymore, this is okay.
# https://github.com/godotengine/godot/issues/82670
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -103,42 +86,8 @@ func disconnect_signals():
drag_started.disconnect(c.callable)


func update_resources(undo_redo: EditorUndoRedoManager):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the new architecture makes it imposible to reuse the resources? This probably makes the resource IDs change constantly. I was hoping that we were going to drop that from William's PR. That is, have a nice AST for very well proven code generation, but keep passing the resources around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it does regenerate IDs frequently. I hadn't paid attention to that part of the changes. I'll see if I can keep that part.

Copy link
Contributor

@dylanmccall dylanmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me. This particular set of changes might have worked out better if we'd scrapped the whole repo and iterated from a clean slate (which is effectively what the final result is), but here we are ;)

Thank you for working through this! These are really nice improvements. I'll rebase #206 on top of this so don't feel like you need to merge for that pull request's sake. I don't see anything that urgently needs fixing except if you can look into reusing resources, but I think it's fine if the main branch just has that issue for a while. We aren't making any promises about clean git history yet.

@manuq
Copy link
Contributor

manuq commented Sep 6, 2024

This all looks fine to me.

Merging then!

@manuq manuq merged commit d8dae7e into main Sep 6, 2024
3 checks passed
@manuq manuq deleted the decouple-ui-from-codegen branch September 6, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants