-
Notifications
You must be signed in to change notification settings - Fork 278
Add a "validate" argument option to JSONSerializer #1775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "validate" argument option to JSONSerializer #1775
Conversation
Pull Request Test Coverage Report for Build 1841787060Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see. Is there a SSLIB bug to implement eq for signature? JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True? |
Exactly. The
Does not look like it. @MVrachev, would you mind creating a ticket and referencing it in a TODO comment in
Agreed. |
5d5a0b8
to
16e46c1
Compare
I rebased and updated this pr
I removed
I proposed a pr with that: secure-systems-lab/securesystemslib#383
I added a simple test for that and changed one of the places where we instantiate a JSONSerializer to use validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, @MVrachev! Please address my comments and we should be able to soon merge.
Regarding tests, I do think it would be good to add a few to test_metadata_serialization.py
that call serialize
with self.validate
being True
and a metadata_ob
that triggers the different validation errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure looks good to me. I left a few more comments in source.
It's good to solve issue #1788 inside this pr as well. |
16e46c1
to
dde1290
Compare
I rebased this pr on top of the latest
I know the changes look huge, but that's mostly from |
2a39784
to
fb713d8
Compare
7ea6946
to
91f6981
Compare
I had to rebase a couple of times because GitHub showed changes from previous commits. |
91f6981
to
dbc4bb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my earlier comments and adding all these tests. I noticed a few more things. Please address and ping me for another round...
tuf/api/metadata.py
Outdated
|
||
# Iterate over self.signatures and other.signatures at the same time | ||
# as the signatures in the file format are ordered. | ||
keyids = zip(self.signatures.keys(), other.signatures.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why zipping two lists you just checked to be equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the check if self.signatures.keys() != other.signatures.keys():
we know that self
and other
have the same keys, but we don't know if they follow the same order.
When I zip them I make sure I traverse both of them following their corresponding insertion order.
If the self.signatures
and other.signatures
orders are different, then when I check (a couple of lines below) self_keyid != other_keyid
will return False.
I tried without zipping, but it fails the test test_md_eq_signatures_reversed_order
inside test_metadata_eq_.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.
but doesn't that check actually tell us that the order is correct: List eq does check the order?
Then that would mean you don't need the loop at all -- could just check self.signatures != other.signatures
since now you know the order is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, good point, because keys()
doesn't return a list anymore in Python3 but rather something set-like and thus unordered.
But still, you're comparing keyids twice here. You could instead only check the lengths at first, i.e. len(self.signatures.keys()) != len(other.signatures.keys())
, or just pass strict=True
to zip
, which does the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just check self.signatures != other.signatures
Not until secure-systems-lab/securesystemslib#383 is merged and released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just check self.signatures != other.signatures
Not until secure-systems-lab/securesystemslib#383 is merged and released.
... maybe we should just go ahead and do that, before spending more thoughts on zip
:D
dbc4bb6
to
f9344f8
Compare
tuf/api/metadata.py
Outdated
|
||
# Iterate over self.signatures and other.signatures at the same time | ||
# as the signatures in the file format are ordered. | ||
keyids = zip(self.signatures.keys(), other.signatures.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.
but doesn't that check actually tell us that the order is correct: List eq does check the order?
Then that would mean you don't need the loop at all -- could just check self.signatures != other.signatures
since now you know the order is the same
I closed it by mistake. |
f9344f8
to
b4a6c7e
Compare
I updated the pr after unrecognized fields support was added in securesystemslib version 0.22.0
I think I addressed all comments by @jku and @lukpueh. It's ready for another round of reviews. |
b4a6c7e
to
16d907e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your efforts here, @MVrachev! It's getting there. I have a few more comments. This time I also took a closer look at the new test module.
tests/test_metadata_eq_.py
Outdated
self.assertEqual(md, md_2) | ||
|
||
setattr(md_2, self.case_name, test_case_data) | ||
self.assertNotEqual(md, md_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you use this pattern a lot in this module (everywhere where you use the run_sub_tests_with_dataset
decorator):
for $DATA in $DATA_SET
1. load container
2. assert container not equal to empty string
3. copy container and assert equal
4. patch container copy with $DATA and assert unequal to original
I think this gives us good test coverage, but maybe you can restructure the pattern a bit? More specifically,
steps 1-3 are don't really need to be repeated for every $DATA
in $DATA_SET
, when only step 4 varies.
I think it should look something like this:
1. load container
2. assert container not equal to empty string
3. copy container and assert equal
4. for $DATA in $DATA_SET
patch container copy with $DATA and assert unequal to original
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment applies in other tests that use that decorator below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we are redoing steps 1-3 maany times and that it's probably to optimize them and I do agree with the second pseudocode and it should work.
There are two disadvantages to opting-out from the decorator:
- we don't receive an exact message which of the $DATA caused the effect as we do by using the decorator and a subtest
- If one of the $DATA_SET cases fails then the other will do as well.
In a discussion with @lukpueh we remembered that the asserts provided msg
option meaning that if an assert fails then the custom message will be appended. This will resolve 1 as a problem. For example, if we want to add Failed case: {case}
to the message the result message of a failing equal looks like this:
<tuf.api.metadata.Metadata object at 0x7fdf6db05f00> != 'bbaba' : Failed case: signed
meaning we still see a log of the given argument.
So, the only downside to using a for loop here is 2 which I think is okay given that if for one of the attributes the test fails most likely it will fail for the others as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still use the decorator. I think it makes the tests even cleaner (despite the repetitive first item in the data set rows). Here's an example how I aggregated the common parts of your test_metadata_eq_
and test_signed_eq_
. You should be able to use these for most of your other test_*_eq functions as well.
md_data: utils.DataSet = {
"snapshot signed": snapshot, # test_metadata_eq_ (copy_and_simple_assert)
"snapshot spec_version": snapshot["signed"], # test_signed_eq_ (copy_and_simple_assert)
# add data for test_key_eq_, test_role_eq_, test_root_eq_, test_metafile_eq_, ...
}
@utils.run_sub_tests_with_dataset(md_data)
def test_metadata__eq__(self, md: Any) -> None:
self.assertNotEqual(md, "")
md_2 = copy.deepcopy(md)
self.assertEqual(obj, md)
md_attrs_data: utils.DataSet = {
"snapshot signed": (snapshot, "signed", None), # test_metadata_eq_ (iteration 1)
"snapshot signatures": (snapshot, "signatures", None), # test_metadata_eq_ (iteration 2)
"snapshot version": (snapshot["signed"], "version", -1), # test_signed_eq_ (iteration 1)
"snapshot spec_version": (snapshot["signed"], "spec_version", "0.0.0"), # test_signed_eq_ (iteration 2)
# add data for test_key_eq_, test_role_eq_, test_root_eq_, test_metafile_eq_, ...
}
@utils.run_sub_tests_with_dataset(md_attrs_data)
def test_metadata_attrs__eq__(self, test_case_data: Tuple) -> None:
md, attr, val = test_case_data
md_2 = copy.deepcopy(md)
setattr(md_2, attr, val)
self.assertNotEqual(md, md_2)
NOTE: I didn't include the initialization for the first argument (md
), if you want to access what's in cls.metadata
, you might need to define the test data sets in setUpClass
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I think the way the tests are defined now I think is simpler than what you suggested.
We will have one big dataset with many cases instead of separating the attributes into logically separated datasets.
Additionally, the dataset won't be so beautiful when working Key
, Role,
DelegatedRole
, TargetFile
and MetaFile
data. In order to reuse it we will have to define it inside setupClass
, so the data can be reused.
The only real benefit I see is that the file will be smaller.
What do you think @jku?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have problems figuring out if a specific test is even useful, eq() is such a special case. I don't have a strong opinion on the style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to say that your original idea of using the decorator was quite good, albeit with some restructuring. Apologies for causing any confusion. I'm really also fine either way. Thanks for your persistent efforts, @MVrachev!
|
||
self.assertEqual(md, md_2) | ||
|
||
def test_md_eq_special_signatures_tests(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can trust securesystemslib
to properly implement and test Signatures.__eq__
. So I suggest to remove this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree with this one.
Here we are testing the following cases:
- Test that metadata objects with different signatures are not equal.
- Test that metadata objects with empty signatures are equal
- Metadata objects with different signatures types are not equal.
which doesn't test the same thing as https://github.com/secure-systems-lab/securesystemslib/blob/075043e46b1d017fc332cf44a2038558b3e246d8/tests/test_signer.py#L75,
d644836
to
9329a4e
Compare
I rebased and made the following major changes:
Thanks for the reviews, now it should be in a good shape I hope. :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- it's a lot of new code but I think it's worth it and the eq() implementations are now simple
- I admit I am not able to evaluate if the tests are useful:
eq()
is really tricky. But I trust Martins judgement here: LGTM - We only use this feature in one place in tests ATM, I wonder if we should do it more?
Anyway, looks good to me, we can keep improving this after the fact -- e.g. this will be very nice to have for the static tests (#1806). Thanks for your work.
If you have looked at the potential places (any |
By adding __eq__ we can compare that two objects are equal. That will be useful when adding validation API call. One bug I have found during testing is that I don't check if the type of "other" in the __eq__ implementations are the expected ones. I assumed that when comparing "root == obj" if "obj" is None that automatically the result will be false. Later after a mypy warning, I realized we should implement the __eq__ methods to accept "Any" type as other and we should check manually that "other" is the expected type. Signed-off-by: Martin Vrachev <[email protected]>
Test the "__eq__" implementation for all classes defined in tuf/api/metadata.py The tests are many but simple. The idea is to test each of the metadata classes one by one and with this to make sure there are no possible cases missed. Signed-off-by: Martin Vrachev <[email protected]>
If the "validation" argument is set then when serializing the metadata object will be validated. Signed-off-by: Martin Vrachev <[email protected]>
After we have dropped OrderedDict in theupdateframework@e3b267e we are relying on python3.7+ default behavior to preserve the insertion order, but there is one caveat. When comparing dictionaries the order is still irrelevant compared to OrderedDict. For example: >>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)]) False >>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)]) True There are two special attributes, defined in the specification, where the order makes a difference when comparing two objects: - Metadata.signatures - Targets.delegations.roles. We want to make sure that the order in those two cases makes a difference when comparing two objects and that's why those changes are required inside two __eq__ implementations. Signed-off-by: Martin Vrachev <[email protected]>
9329a4e
to
6ea5372
Compare
When I was writing the tests I wanted to make sure we are:
While writing the tests I used coverall to check which lines in the
Well, we don't actually use validation in one test, but in multiple, because it's in the helper function |
Fixes #1696, #1788
Description of the changes being introduced by the pull request:
We can say we are almost done with Metadata API validation during the initialization of
Signed
objects as summarized here: #1140 (comment).What we didn't focus on is validation when serializing the Metadata objects by calling
Metadata.to_dict()
.That option can be useful for package managers who import "TUF" into their lifecycle by giving them an assurance that when there are multiple changes of a metadata object then serializing this object to a file and then back to an instance from a file is possible.
I have added a separate API call
Metadata.validate()
that could be called to give this assurance.This function can also be used by passing a
validate
argument when creating aJsonSerializer
.By default, this argument is False.
Please verify and check that the pull request fulfills the following
requirements: