-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: remove EssaNodes #14777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: remove EssaNodes #14777
Conversation
0588b1b
to
6874a15
Compare
This commit removes SSA nodes from the data flow graph. Specifically, for a definition and use such as ```python x = expr y = x + 2 ``` we used to have flow from `expr` to an SSA variable representing x and from that SSA variable to the use of `x` in the definition of `y`. Now we instead have flow from `expr` to the control flow node for `x` at line 1 and from there to the control flow node for `x` at line 2. Specific changes: - `EssaNode` from the data flow layer no longer exists. - Several glue steps between `EssaNode`s and `CfgNode`s have been deleted. - Entry nodes are now admitted as `CfgNodes` in the data flow layer (they were filtered out before). - Entry nodes now have a new `toString` taking into account that the module name may be ambigous. - Some tests have been rewritten to accomodate the changes, but only `python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll` should have semantic changes. - Comments have been updated - Test output has been updated, but apart from `python/ql/test/experimental/dataflow/basic/maximalFlows.expected` only `python/ql/test/experimental/dataflow/typetracking-summaries/summaries.py` should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.
Rename variable to reflect larger scope We had test results inside `os.py`, I suppose we have found a little extra flow.
6874a15
to
421d4f3
Compare
What is the motivation for removing SSA definitions from the data flow graph, and does this pertain to all SSA definitions (e.g. also |
The motivation is to make the dataflow graph simpler and easier to understand. Part of the flow is computed via an SSA analysis, but you would not have to understand that part to read the dataflow graph. I checked, and we already have those blow-ups (#14861, #14858). We should fix them by adding those phi-nodes, but under a different name, I think, that makes it clear what they are doing in the dataflow graph. Something like |
In principle, couldn't we have flow from I guess a counter-example is our current code for iterable unpacking such as |
I think that we could skip the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks fine to me. Have two small NITs, otherwise I'll leave the rest of the review for Taus 😊
python/ql/test/experimental/dataflow/coverage/argumentRoutingTest.ql
Outdated
Show resolved
Hide resolved
python/ql/test/experimental/dataflow/coverage/argumentRoutingTest.ql
Outdated
Show resolved
Hide resolved
or | ||
// Async with var definition | ||
// `async with f(42) as x:` | ||
// nodeFrom is `x`, cfg node | ||
// nodeTo is `x`, essa var | ||
// | ||
// This makes the cfg node the local source of the awaited value. | ||
// | ||
// We have this step in addition to the step above, to handle cases where the QL | ||
// modeling of `f(42)` requires a `.getAwaited()` step (in API graphs) when not | ||
// using `async with`, so you can do both: | ||
// * `foo = await x.foo(); await foo.async_method(); foo.close()` and | ||
// * `async with x.foo() as foo: await foo.async_method()`. | ||
exists(With with, ControlFlowNode var | | ||
nodeFrom.(CfgNode).getNode() = var and | ||
nodeTo.(EssaNode).getVar().getDefinition().(WithDefinition).getDefiningNode() = var and | ||
with.getOptionalVars() = var.getNode() and | ||
with.isAsync() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little suspicious about this change, but looking over the changes I did in 1f93e5b, I honestly don't quite see how this bit was required in the first place 🤔
However, since python/ql/test/library-tests/frameworks/aiohttp/client_request.py
is still passing, I think we're good 👍
DOH, yes. I've added the code to force LHS to be included in the path explanations 🤦 Let's keep that for sure 👍 |
I am not sure I agree, as the SSA library already has perfectly good names for this. Also, (read) phi nodes should be hidden, so query writers will not have to worry about them. |
Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
For the nodes we hide, we can be more free about the naming, but I do not think the SSA names are perfectly good. Unless your mind is already in an SSA context, "phi" is not descriptive at all and unlikely to be helpful. I guess it depends on whether we assume our users to be very aware of dataflow being computed partly by an SSA transformation and knowing about phi-nodes. I think we cannot really have that assumption in general. |
I think we can have the assumption that the documentation of the internals of the dataflow library caters to internal developers, and they should know about SSA and phi nodes :) |
…emove-ssa-nodes-from-dataflow-graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great! It was a bit hard to review with all of the simultaneous changes, though. Perhaps a better approach would have been to keep EssaNode
around while moving over the individual uses one at a time (and recording the test changes along the way). That way it wouldn't have been a single massive commit with all of the changes.
I'm still wondering a bit about the new control flow nodes we're getting for scope entry definitions in modules. I think they're probably harmeless, though.
I have made a few small suggestions here and there for further cleanup, but nothing big.
To me, this looks like it's good to merge, modulo those small fixups, so I'm marking this as approved. 🙂
python/ql/test/experimental/dataflow/strange-essaflow/testFlow.ql
Outdated
Show resolved
Hide resolved
tainted_lambda = TTS_apply_lambda(lambda x: x, tracked) # $ tracked | ||
tainted_lambda # $ MISSING: tracked | ||
tainted_lambda # $ tracked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not entirely clear to me why this is now working (though I'm not one to look a gift horse in the mouth). 🤔
// exclude things like `GSSA variable func` | ||
exists(ref.asExpr()) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably not needed now? (Or at the very least, the comment should be updated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed not needed, I removed it
Co-authored-by: Taus <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR removes SSA nodes from the data flow graph. Specifically, for a definition and use such as
we used to have flow from
expr
to an SSA variable representing x and from that SSA variable to the use ofx
in the definition ofy
. Now we instead have flow fromexpr
to the control flow node forx
at line 1 and from there to the control flow node forx
at line 2.Specific changes:
EssaNode
from the data flow layer no longer exists.EssaNode
s andCfgNode
s have been deleted.CfgNodes
in the data flow layer (they were filtered out before).toString
taking into account that the module name may be ambigous.python/ql/test/experimental/dataflow/basic/maximalFlowsConfig.qll
should have semantic changes.python/ql/test/experimental/dataflow/basic/maximalFlows.expected
onlypython/ql/test/experimental/dataflow/typetracking-summaries/summaries.py
should have a semantic change. This is a bonus fix, probably meaning that something was never connected up correctly.