Skip to content

Python: remove EssaNodes #14777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Nov 13, 2023

This PR removes SSA nodes from the data flow graph. Specifically, for a definition and use such as

  x = expr
  y = x + 2

we used to have flow from expr to an SSA variable representing x and from that SSA variable to the use of x in the definition of y. Now we instead have flow from expr to the control flow node for x at line 1 and from there to the control flow node for x at line 2.

Specific changes:

  • EssaNode from the data flow layer no longer exists.
  • Several glue steps between EssaNodes and CfgNodes have been deleted.
  • Entry nodes are now admitted as CfgNodes in the data flow layer (they were filtered out before).
  • Entry nodes now have a new toString taking into account that the module name may be ambigous.
  • Some tests have been rewritten to accomodate the changes, but only python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll should have semantic changes.
  • Comments have been updated
  • Test output has been updated, but apart from python/ql/test/experimental/dataflow/basic/maximalFlows.expected only python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.

@yoff yoff force-pushed the python/remove-ssa-nodes-from-dataflow-graph branch 3 times, most recently from 0588b1b to 6874a15 Compare November 20, 2023 16:26
yoff added 2 commits November 20, 2023 21:35
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
```python
  x = expr
  y = x + 2
```
we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2.

Specific changes:
- `EssaNode` from the data flow layer no longer exists.
- Several glue steps between `EssaNode`s and `CfgNode`s have been deleted.
- Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before).
- Entry nodes now have a new `toString` taking into account that the module name may be ambigous.
- Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes.
- Comments have been updated
- Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
Rename variable to reflect larger scope

We had test results inside `os.py`, I suppose we have found a little extra flow.
@yoff yoff force-pushed the python/remove-ssa-nodes-from-dataflow-graph branch from 6874a15 to 421d4f3 Compare November 20, 2023 20:36
@yoff yoff marked this pull request as ready for review November 20, 2023 20:45
@yoff yoff requested a review from a team as a code owner November 20, 2023 20:45
@tausbn tausbn self-requested a review November 20, 2023 21:15
@hvitved
Copy link
Contributor

hvitved commented Nov 21, 2023

This PR removes SSA nodes from the data flow graph.

What is the motivation for removing SSA definitions from the data flow graph, and does this pertain to all SSA definitions (e.g. also phi nodes)? Removing phi (or phi-read) nodes can result in a quadratic blowup: #10927.

@yoff
Copy link
Contributor Author

yoff commented Nov 21, 2023

This PR removes SSA nodes from the data flow graph.

What is the motivation for removing SSA definitions from the data flow graph, and does this pertain to all SSA definitions (e.g. also phi nodes)? Removing phi (or phi-read) nodes can result in a quadratic blowup: #10927.

The motivation is to make the dataflow graph simpler and easier to understand. Part of the flow is computed via an SSA analysis, but you would not have to understand that part to read the dataflow graph.

I checked, and we already have those blow-ups (#14861, #14858). We should fix them by adding those phi-nodes, but under a different name, I think, that makes it clear what they are doing in the dataflow graph. Something like UndeterminedReadNode, CommonReadNode, ReadFanInNode, or something like that. It will still require an explanation somewhere, but that can be in terms of optimising the dataflow graph rather than involving the whole subject of SSA (although the relation to phi-nodes can be mentioned for the benefit of the initiated).

@RasmusWL
Copy link
Member

x = expr
y = x + 2

Now we instead have flow from expr to the control flow node for x at line 1 and from there to the control flow node for x at line 2.

In principle, couldn't we have flow from expr directly to the first use of x (on line 2)?

I guess a counter-example is our current code for iterable unpacking such as ((x,y), z) = ..., which wouldn't work well if we ignored the LHS of the assignment? 😅

@yoff
Copy link
Contributor Author

yoff commented Nov 21, 2023

x = expr
y = x + 2

Now we instead have flow from expr to the control flow node for x at line 1 and from there to the control flow node for x at line 2.

In principle, couldn't we have flow from expr directly to the first use of x (on line 2)?

I guess a counter-example is our current code for iterable unpacking such as ((x,y), z) = ..., which wouldn't work well if we ignored the LHS of the assignment? 😅

I think that we could skip the x node and handle iterable unpacking separately (the code already does this, we would just change definition flow). I am not sure what we would like to see in path explanations, though, and might like to make that decision separately, by hiding nodes. Imagine that expr is long and convoluted but there is a y buried in there. It might be nice to jump into that y from above and then out to the x before going any further.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks fine to me. Have two small NITs, otherwise I'll leave the rest of the review for Taus 😊

Comment on lines -316 to -334
or
// Async with var definition
// `async with f(42) as x:`
// nodeFrom is `x`, cfg node
// nodeTo is `x`, essa var
//
// This makes the cfg node the local source of the awaited value.
//
// We have this step in addition to the step above, to handle cases where the QL
// modeling of `f(42)` requires a `.getAwaited()` step (in API graphs) when not
// using `async with`, so you can do both:
// * `foo = await x.foo(); await foo.async_method(); foo.close()` and
// * `async with x.foo() as foo: await foo.async_method()`.
exists(With with, ControlFlowNode var |
nodeFrom.(CfgNode).getNode() = var and
nodeTo.(EssaNode).getVar().getDefinition().(WithDefinition).getDefiningNode() = var and
with.getOptionalVars() = var.getNode() and
with.isAsync()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little suspicious about this change, but looking over the changes I did in 1f93e5b, I honestly don't quite see how this bit was required in the first place 🤔

However, since python/ql/test/library-tests/frameworks/aiohttp/client_request.py is still passing, I think we're good 👍

@RasmusWL
Copy link
Member

I think that we could skip the x node and handle iterable unpacking separately (the code already does this, we would just change definition flow). I am not sure what we would like to see in path explanations, though, and might like to make that decision separately, by hiding nodes. Imagine that expr is long and convoluted but there is a y buried in there. It might be nice to jump into that y from above and then out to the x before going any further.

DOH, yes. I've added the code to force LHS to be included in the path explanations 🤦 Let's keep that for sure 👍

@hvitved
Copy link
Contributor

hvitved commented Nov 21, 2023

We should fix them by adding those phi-nodes, but under a different name, I think, that makes it clear what they are doing in the dataflow graph.

I am not sure I agree, as the SSA library already has perfectly good names for this. Also, (read) phi nodes should be hidden, so query writers will not have to worry about them.

Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
@yoff
Copy link
Contributor Author

yoff commented Nov 22, 2023

We should fix them by adding those phi-nodes, but under a different name, I think, that makes it clear what they are doing in the dataflow graph.

I am not sure I agree, as the SSA library already has perfectly good names for this. Also, (read) phi nodes should be hidden, so query writers will not have to worry about them.

For the nodes we hide, we can be more free about the naming, but I do not think the SSA names are perfectly good. Unless your mind is already in an SSA context, "phi" is not descriptive at all and unlikely to be helpful. I guess it depends on whether we assume our users to be very aware of dataflow being computed partly by an SSA transformation and knowing about phi-nodes. I think we cannot really have that assumption in general.

@RasmusWL
Copy link
Member

For the nodes we hide, we can be more free about the naming, but I do not think the SSA names are perfectly good. Unless your mind is already in an SSA context, "phi" is not descriptive at all and unlikely to be helpful. I guess it depends on whether we assume our users to be very aware of dataflow being computed partly by an SSA transformation and knowing about phi-nodes. I think we cannot really have that assumption in general.

I think we can have the assumption that the documentation of the internals of the dataflow library caters to internal developers, and they should know about SSA and phi nodes :)

tausbn
tausbn previously approved these changes Dec 4, 2023
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great! It was a bit hard to review with all of the simultaneous changes, though. Perhaps a better approach would have been to keep EssaNode around while moving over the individual uses one at a time (and recording the test changes along the way). That way it wouldn't have been a single massive commit with all of the changes.

I'm still wondering a bit about the new control flow nodes we're getting for scope entry definitions in modules. I think they're probably harmeless, though.

I have made a few small suggestions here and there for further cleanup, but nothing big.

To me, this looks like it's good to merge, modulo those small fixups, so I'm marking this as approved. 🙂

tainted_lambda = TTS_apply_lambda(lambda x: x, tracked) # $ tracked
tainted_lambda # $ MISSING: tracked
tainted_lambda # $ tracked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear to me why this is now working (though I'm not one to look a gift horse in the mouth). 🤔

Comment on lines 18 to 19
// exclude things like `GSSA variable func`
exists(ref.asExpr()) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably not needed now? (Or at the very least, the comment should be updated.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed not needed, I removed it

@yoff yoff requested a review from tausbn December 4, 2023 16:51
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants