-
Notifications
You must be signed in to change notification settings - Fork 1.7k
python: enable summaries from model #12581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python: enable summaries from model #12581
Conversation
c3e6819
to
efa36d2
Compare
This requires a change to the shared interface: Making `getNodeFromPath` public. This because Python is doing its own thing and identifying call-backs.
(but no summaries yet)
efa36d2
to
2296410
Compare
`base` is already the `CallNode` we want.
and add summaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 💪
Although a bit of a nitpick, can we please change Foo
in the package name to foo
? -- it just stands out as very non-standard.
I think it's a shame that the data-flow tests and taint-tracking tests are almost identical.
For taint-tracking we've mostly used InlineTaintTest.qll, so maybe we can use that alongside the data-flow test, and only have a single TestSummaries.qll
file?
I looked a bit at where the existing NormalTaintTest was used, and the only case is https://github.com/github/codeql/blob/11c89adbe3b238e3a142256175f0003e19d7972b/python/ql/test/experimental/dataflow/summaries/summaries.py -- I think that could also benefit from being rewritten to have BOTH dataflow and taint-tracking tests, instead of only having taint tests. |
also change `Foo` -> `foo`
I tried putting it in one file now, I agree we should consolidate all our summary tests at some point.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the two code comments, I also have a stylistic recommendation for your Python code.
- put trailing commas after last argument (means diff for adding a new argument in the future is a bit more clean)
- do not indent closing parentheses (
Both follows black formatter (although it would try to put both arguments to ensure_tainted
on one line 😮💨)
specifically it would change your code like this:
ensure_tainted(
tainted_list_el, # $ tainted
- tainted_list_el[0] # $ tainted
- )
+ tainted_list_el[0], # $ tainted
+)
|
||
tainted = MS_identity(TAINTED_STRING) | ||
ensure_tainted(tainted) # $ tainted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have just shown that we have data-flow, and we know all dataflow steps are also taint-flow steps, I think it's fine to only check taint-flow for the cases where there is NOT dataflow.
I think that will make the test-file a bit easier to read as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I tried to make it more consistent now by not checking taint when dataflow is already established and when we do check taint, check both the collection and the expected element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking some more about this, I think these tests should be about whether you can write flow summaries in CSV files that do the right thing.
Having all these extra taint steps are closer to what our current taint-tracking does in the end, but from my point of view, doesn't help us achieve the goal of the tests.
I removed the TAINTED_LIST
parts locally, but thought it was a bit too controversial to just commit directly to your PR -- instead I've put the commits here: yoff#79 (if you agree, we can add these to main later on 👍)
python/ql/test/experimental/dataflow/model-summaries/model_summaries.py
Outdated
Show resolved
Hide resolved
…maries.py Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
- do not test taint flow whne dataflow is established - test taint of both the collection and the expected element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment otherwise LGTM
javascript/ql/lib/semmle/javascript/frameworks/data/internal/ApiGraphModels.qll
Outdated
Show resolved
Hide resolved
ruby/ql/lib/codeql/ruby/frameworks/data/internal/ApiGraphModels.qll
Outdated
Show resolved
Hide resolved
python/ql/lib/semmle/python/frameworks/data/internal/ApiGraphModels.qll
Outdated
Show resolved
Hide resolved
Co-authored-by: Asger F <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the tests pass.
There seem to be some test expectations that need updating, probably due to some of the recent changes to inline test expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for now, tests can be adjusted later 👍
This requires a change to the shared interface:
Making
getNodeFromPath
public.This because Python is doing its own thing and identifying call-backs.
I am unsure if this constitutes a feature yet, or if we should add a CSV parser first?