Add a "validate" argument option to JSONSerializer #1775

MVrachev · 2022-01-17T17:22:21Z

Description of the changes being introduced by the pull request:

We can say we are almost done with Metadata API validation during the initialization of Signed objects as summarized here: #1140 (comment).

What we didn't focus on is validation when serializing the Metadata objects by calling Metadata.to_dict().

That option can be useful for package managers who import "TUF" into their lifecycle by giving them an assurance that when there are multiple changes of a metadata object then serializing this object to a file and then back to an instance from a file is possible.
I have added a separate API call Metadata.validate() that could be called to give this assurance.

This function can also be used by passing a validate argument when creating a JsonSerializer.
By default, this argument is False.

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

coveralls · 2022-01-17T17:26:25Z

Pull Request Test Coverage Report for Build 1841787060

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

85 of 87 (97.7%) changed or added relevant lines in 2 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.03%) to 98.323%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
tuf/api/metadata.py	76	77	98.7%
tuf/api/serialization/json.py	9	10	90.0%

Files with Coverage Reduction	New Missed Lines	%
tuf/api/metadata.py	2	98.92%

Totals
Change from base Build 1840943084:	-0.03%
Covered Lines:	1184
Relevant Lines:	1200

💛 - Coveralls

jku · 2022-01-18T07:49:58Z

What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see.

Is there a SSLIB bug to implement eq for signature?

JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True?

lukpueh · 2022-01-18T10:01:19Z

What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see.

Exactly. The validate implementation is too JSONDe/Serializer-specific for the Metadata class. Plus Jussi's 2x-to_dict argument.

Is there a SSLIB bug to implement eq for signature?

Does not look like it. @MVrachev, would you mind creating a ticket and referencing it in a TODO comment in Metadata.__eq__()?

JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True?

Agreed.

tuf/api/metadata.py

MVrachev · 2022-01-18T13:55:54Z

I rebased and updated this pr

What I was trying to say earlier (and I think Lukas as well) is that the validation implementation should be in the JSONSerializer. Otherwise you end up serializing twice as this PR does. Metadata.validate() does not need to exist as far as I can see.

I removed Metadata.validate() and moved it into JSONSerializer.serialize().

Is there a SSLIB bug to implement eq for signature?

I proposed a pr with that: secure-systems-lab/securesystemslib#383

JSONSerializer.serialize() is never tested when validate=True. I suppose we want most of our tests to use validate=True?

I added a simple test for that and changed one of the places where we instantiate a JSONSerializer to use validation.
Many of the places where we call to_bytes we call it when we want to raise another specific error and It didn't make sense to use JSONSerializer with validate = True.
I also don't think it makes sense to comprehensively test this part with all of the classes and their attributes as essentially the when calling JSONSerializer.serialize with validation turned on we are relying on the classes initialization validation which already is tested extensively in test_trusted_metadata_set.py.

lukpueh

Cool stuff, @MVrachev! Please address my comments and we should be able to soon merge.

Regarding tests, I do think it would be good to add a few to test_metadata_serialization.py that call serialize with self.validate being True and a metadata_ob that triggers the different validation errors.

tuf/api/metadata.py

tuf/api/serialization/json.py

tests/test_api.py

lukpueh · 2022-01-20T13:50:08Z

I just merged #1783. Please make sure that we don't run into dict order problems you kindly pointed out in e3b267e!

jku

Structure looks good to me. I left a few more comments in source.

tuf/api/metadata.py

tuf/api/serialization/json.py

MVrachev · 2022-01-26T10:25:35Z

It's good to solve issue #1788 inside this pr as well.

MVrachev · 2022-02-07T11:31:19Z

I rebased this pr on top of the latest develop changes and made the following changes:

As suggested by Jussi removed unrecognized_fields check inside all __eq__ implementation in Signed class derivates as it's checked in Signed.__eq__
Inside Metadata.__eq__ I realized there is no sense in testing if self.signatures is None or other.signatures is None or if both of them are None.
All of those checks can be replaced by type(self.signatures) == type(other.signatures) == dict.
I rewrote the tests. I created a separate test_metadata_eq_ file where similarly to test_metadata_serialization.py I used table testing to write mostly simple tests. I checked if all cases are covered with local coverall runs.
I added made sure that the dictionaries order is taken into account when comparing signatures or delegated roles (see Dict comparisons insensitive to order #1788)
Made sure only SerializationError will be thrown by JSONSerializer.serialize()
Changed the documentation regarding the validate argument.

I know the changes look huge, but that's mostly from test_metadata_eq which should be simple to read.

MVrachev · 2022-02-07T11:46:49Z

I had to rebase a couple of times because GitHub showed changes from previous commits.

lukpueh

Thanks for addressing my earlier comments and adding all these tests. I noticed a few more things. Please address and ping me for another round...

tests/test_metadata_eq_.py

tuf/api/metadata.py

lukpueh · 2022-02-08T09:44:51Z

tuf/api/metadata.py

+
+        # Iterate over self.signatures and other.signatures at the same time
+        # as the signatures in the file format are ordered.
+        keyids = zip(self.signatures.keys(), other.signatures.keys())


Why zipping two lists you just checked to be equal?

After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.

When I zip them I make sure I traverse both of them following their corresponding insertion order.
If the self.signatures and other.signatures orders are different, then when I check (a couple of lines below) self_keyid != other_keyid will return False.

I tried without zipping, but it fails the test test_md_eq_signatures_reversed_order inside test_metadata_eq_.py.

After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.

but doesn't that check actually tell us that the order is correct: List eq does check the order?

Then that would mean you don't need the loop at all -- could just check self.signatures != other.signatures since now you know the order is the same

Ah right, good point, because keys() doesn't return a list anymore in Python3 but rather something set-like and thus unordered.

But still, you're comparing keyids twice here. You could instead only check the lengths at first, i.e. len(self.signatures.keys()) != len(other.signatures.keys()), or just pass strict=True to zip, which does the same thing.

could just check self.signatures != other.signatures

Not until secure-systems-lab/securesystemslib#383 is merged and released.

could just check self.signatures != other.signatures

Not until secure-systems-lab/securesystemslib#383 is merged and released.

... maybe we should just go ahead and do that, before spending more thoughts on zip :D

tuf/api/metadata.py

tuf/api/serialization/json.py

tuf/api/metadata.py

jku · 2022-02-08T12:35:56Z

tuf/api/metadata.py

+
+        # Iterate over self.signatures and other.signatures at the same time
+        # as the signatures in the file format are ordered.
+        keyids = zip(self.signatures.keys(), other.signatures.keys())


After the check if self.signatures.keys() != other.signatures.keys(): we know that self and other have the same keys, but we don't know if they follow the same order.

but doesn't that check actually tell us that the order is correct: List eq does check the order?

Then that would mean you don't need the loop at all -- could just check self.signatures != other.signatures since now you know the order is the same

tuf/api/serialization/json.py

MVrachev · 2022-02-08T14:21:32Z

I closed it by mistake.

MVrachev · 2022-02-14T13:12:48Z

I updated the pr after unrecognized fields support was added in securesystemslib version 0.22.0
The changes include:

I added Metadata.unrecognized_fields inside Metadata._eq_
I changed the way we check the dictionaries order of Metadata.signatures and Delegations.roles
I changed how we actually do validation inside JSONSerializer.serialize() by using json_bytes and calling JSONDeserialize.deserialize() to create a new metadata object for comparison.

I think I addressed all comments by @jku and @lukpueh. It's ready for another round of reviews.

tuf/api/metadata.py

lukpueh

Thanks for your efforts here, @MVrachev! It's getting there. I have a few more comments. This time I also took a closer look at the new test module.

tuf/api/metadata.py

tuf/api/serialization/json.py

tests/test_metadata_eq_.py

lukpueh · 2022-02-16T15:14:55Z

tests/test_metadata_eq_.py

+        self.assertEqual(md, md_2)
+
+        setattr(md_2, self.case_name, test_case_data)
+        self.assertNotEqual(md, md_2)


I see you use this pattern a lot in this module (everywhere where you use the run_sub_tests_with_dataset decorator):

for $DATA in $DATA_SET 1. load container 2. assert container not equal to empty string 3. copy container and assert equal 4. patch container copy with $DATA and assert unequal to original

I think this gives us good test coverage, but maybe you can restructure the pattern a bit? More specifically,
steps 1-3 are don't really need to be repeated for every $DATA in $DATA_SET, when only step 4 varies.

I think it should look something like this:

1. load container 2. assert container not equal to empty string 3. copy container and assert equal 4. for $DATA in $DATA_SET patch container copy with $DATA and assert unequal to original

WDYT?

Same comment applies in other tests that use that decorator below.

I agree that we are redoing steps 1-3 maany times and that it's probably to optimize them and I do agree with the second pseudocode and it should work.
There are two disadvantages to opting-out from the decorator:

we don't receive an exact message which of the $DATA caused the effect as we do by using the decorator and a subtest

If one of the $DATA_SET cases fails then the other will do as well.

In a discussion with @lukpueh we remembered that the asserts provided msg option meaning that if an assert fails then the custom message will be appended. This will resolve 1 as a problem. For example, if we want to add Failed case: {case} to the message the result message of a failing equal looks like this:
<tuf.api.metadata.Metadata object at 0x7fdf6db05f00> != 'bbaba' : Failed case: signed
meaning we still see a log of the given argument.

So, the only downside to using a for loop here is 2 which I think is okay given that if for one of the attributes the test fails most likely it will fail for the others as well.

We can still use the decorator. I think it makes the tests even cleaner (despite the repetitive first item in the data set rows). Here's an example how I aggregated the common parts of your test_metadata_eq_ and test_signed_eq_. You should be able to use these for most of your other test_*_eq functions as well.

md_data: utils.DataSet = { "snapshot signed": snapshot, # test_metadata_eq_ (copy_and_simple_assert) "snapshot spec_version": snapshot["signed"], # test_signed_eq_ (copy_and_simple_assert) # add data for test_key_eq_, test_role_eq_, test_root_eq_, test_metafile_eq_, ... } @utils.run_sub_tests_with_dataset(md_data) def test_metadata__eq__(self, md: Any) -> None: self.assertNotEqual(md, "") md_2 = copy.deepcopy(md) self.assertEqual(obj, md) md_attrs_data: utils.DataSet = { "snapshot signed": (snapshot, "signed", None), # test_metadata_eq_ (iteration 1) "snapshot signatures": (snapshot, "signatures", None), # test_metadata_eq_ (iteration 2) "snapshot version": (snapshot["signed"], "version", -1), # test_signed_eq_ (iteration 1) "snapshot spec_version": (snapshot["signed"], "spec_version", "0.0.0"), # test_signed_eq_ (iteration 2) # add data for test_key_eq_, test_role_eq_, test_root_eq_, test_metafile_eq_, ... } @utils.run_sub_tests_with_dataset(md_attrs_data) def test_metadata_attrs__eq__(self, test_case_data: Tuple) -> None: md, attr, val = test_case_data md_2 = copy.deepcopy(md) setattr(md_2, attr, val) self.assertNotEqual(md, md_2)

NOTE: I didn't include the initialization for the first argument (md), if you want to access what's in cls.metadata, you might need to define the test data sets in setUpClass as well.

Honestly, I think the way the tests are defined now I think is simpler than what you suggested.
We will have one big dataset with many cases instead of separating the attributes into logically separated datasets.

Additionally, the dataset won't be so beautiful when working Key, Role, DelegatedRole, TargetFile and MetaFile data. In order to reuse it we will have to define it inside setupClass, so the data can be reused.

The only real benefit I see is that the file will be smaller.

What do you think @jku?

I honestly have problems figuring out if a specific test is even useful, eq() is such a special case. I don't have a strong opinion on the style

Just wanted to say that your original idea of using the decorator was quite good, albeit with some restructuring. Apologies for causing any confusion. I'm really also fine either way. Thanks for your persistent efforts, @MVrachev!

tests/test_metadata_eq_.py

lukpueh · 2022-02-16T15:59:17Z

tests/test_metadata_eq_.py

+
+        self.assertEqual(md, md_2)
+
+    def test_md_eq_special_signatures_tests(self) -> None:


I think we can trust securesystemslib to properly implement and test Signatures.__eq__. So I suggest to remove this test.

I don't think I agree with this one.
Here we are testing the following cases:

Test that metadata objects with different signatures are not equal.

Test that metadata objects with empty signatures are equal

Metadata objects with different signatures types are not equal.

which doesn't test the same thing as https://github.com/secure-systems-lab/securesystemslib/blob/075043e46b1d017fc332cf44a2038558b3e246d8/tests/test_signer.py#L75,

tests/test_metadata_eq_.py

MVrachev · 2022-02-17T19:25:44Z

I rebased and made the following major changes:

simplified the tests a lot by following the @lukpueh suggestion and using a for loop instead of the decorator and adding a custom message to gather information when there is an error from which attribute is coming.
simplified the tests related to order of signatures and delegated roles by using reverse (but because python3.7 didn't have reverse implemented for dictionaries I had to cast to a list first).
simplified the verification that objects with the same signatures, but in a different order are different. The same simplification was done for delegated role.

Thanks for the reviews, now it should be in a good shape I hope. :D

jku

it's a lot of new code but I think it's worth it and the eq() implementations are now simple
I admit I am not able to evaluate if the tests are useful: eq() is really tricky. But I trust Martins judgement here: LGTM
We only use this feature in one place in tests ATM, I wonder if we should do it more?

Anyway, looks good to me, we can keep improving this after the fact -- e.g. this will be very nice to have for the static tests (#1806). Thanks for your work.

tuf/api/metadata.py

jku · 2022-02-25T08:15:08Z

We only use this feature in one place in tests ATM, I wonder if we should do it more?

If you have looked at the potential places (any Metadata.to_files/to_bytes() calls) already, and decided not to use validate please leave a comment so we know.

By adding __eq__ we can compare that two objects are equal. That will be useful when adding validation API call. One bug I have found during testing is that I don't check if the type of "other" in the __eq__ implementations are the expected ones. I assumed that when comparing "root == obj" if "obj" is None that automatically the result will be false. Later after a mypy warning, I realized we should implement the __eq__ methods to accept "Any" type as other and we should check manually that "other" is the expected type. Signed-off-by: Martin Vrachev <[email protected]>

Test the "__eq__" implementation for all classes defined in tuf/api/metadata.py The tests are many but simple. The idea is to test each of the metadata classes one by one and with this to make sure there are no possible cases missed. Signed-off-by: Martin Vrachev <[email protected]>

If the "validation" argument is set then when serializing the metadata object will be validated. Signed-off-by: Martin Vrachev <[email protected]>

After we have dropped OrderedDict in theupdateframework@e3b267e we are relying on python3.7+ default behavior to preserve the insertion order, but there is one caveat. When comparing dictionaries the order is still irrelevant compared to OrderedDict. For example: >>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)]) False >>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)]) True There are two special attributes, defined in the specification, where the order makes a difference when comparing two objects: - Metadata.signatures - Targets.delegations.roles. We want to make sure that the order in those two cases makes a difference when comparing two objects and that's why those changes are required inside two __eq__ implementations. Signed-off-by: Martin Vrachev <[email protected]>

MVrachev · 2022-02-28T12:57:57Z

it's a lot of new code but I think it's worth it and the eq() implementations are now simple

I admit I am not able to evaluate if the tests are useful: eq() is really tricky. But I trust Martins judgement here: LGTM

When I was writing the tests I wanted to make sure we are:

calling each of the classes with its corresponding __eq__ implementation
comparing an object to a different object (in our case with an empty string)
changing each of the attributes for each of the classes and then calling __eq__.
That way we check each of the == comparisons in each of the __eq__ functions.

While writing the tests I used coverall to check which lines in the __eq__ implementations were tested.

We only use this feature in one place in tests ATM, I wonder if we should do it more?

Anyway, looks good to me, we can keep improving this after the fact -- e.g. this will be very nice to have for the static tests (#1806). Thanks for your work.

If you have looked at the potential places (any Metadata.to_files/to_bytes() calls) already, and decided not to use validate please leave a comment so we know.

Well, we don't actually use validation in one test, but in multiple, because it's in the helper function modify_metadata inside tests/test_trusted_metadata_set.py.
I didn't want to include this feature in multiple tests as I thought it can slow them down.
Let's keep it that way for now and if we decide we can always change that.

lukpueh reviewed Jan 18, 2022

View reviewed changes

tuf/api/metadata.py Outdated Show resolved Hide resolved

MVrachev force-pushed the validation-during-serialization branch 2 times, most recently from 5d5a0b8 to 16e46c1 Compare January 18, 2022 13:48

MVrachev changed the title ~~Add a "validate" API call~~ Add a "validate" argument option to JSONSerializer Jan 18, 2022

lukpueh requested changes Jan 20, 2022

View reviewed changes

jku requested changes Jan 24, 2022

View reviewed changes

tuf/api/metadata.py Outdated Show resolved Hide resolved

tuf/api/serialization/json.py Outdated Show resolved Hide resolved

tuf/api/serialization/json.py Outdated Show resolved Hide resolved

MVrachev mentioned this pull request Jan 26, 2022

Dict comparisons insensitive to order #1788

Closed

MVrachev force-pushed the validation-during-serialization branch from 16e46c1 to dde1290 Compare February 7, 2022 11:19

MVrachev force-pushed the validation-during-serialization branch 2 times, most recently from 2a39784 to fb713d8 Compare February 7, 2022 11:34

MVrachev requested review from lukpueh and jku February 7, 2022 11:36

MVrachev force-pushed the validation-during-serialization branch 3 times, most recently from 7ea6946 to 91f6981 Compare February 7, 2022 11:44

MVrachev force-pushed the validation-during-serialization branch from 91f6981 to dbc4bb6 Compare February 7, 2022 11:50

lukpueh requested changes Feb 8, 2022

View reviewed changes

MVrachev force-pushed the validation-during-serialization branch from dbc4bb6 to f9344f8 Compare February 8, 2022 12:28

jku reviewed Feb 8, 2022

View reviewed changes

MVrachev closed this Feb 8, 2022

MVrachev deleted the validation-during-serialization branch February 8, 2022 14:20

MVrachev restored the validation-during-serialization branch February 8, 2022 14:21

MVrachev reopened this Feb 8, 2022

MVrachev force-pushed the validation-during-serialization branch from f9344f8 to b4a6c7e Compare February 14, 2022 13:09

MVrachev requested review from lukpueh and jku February 14, 2022 13:12

jku reviewed Feb 14, 2022

View reviewed changes

tuf/api/metadata.py Outdated Show resolved Hide resolved

MVrachev force-pushed the validation-during-serialization branch from b4a6c7e to 16d907e Compare February 14, 2022 15:06

jku mentioned this pull request Feb 16, 2022

repository_tool: Root pre-verification during metadata writes misses possible verification failures #883

Closed

lukpueh requested changes Feb 16, 2022

View reviewed changes

MVrachev force-pushed the validation-during-serialization branch 2 times, most recently from d644836 to 9329a4e Compare February 17, 2022 19:21

MVrachev requested review from lukpueh and jku February 17, 2022 19:25

jku approved these changes Feb 25, 2022

View reviewed changes

tuf/api/metadata.py Outdated Show resolved Hide resolved

lukpueh approved these changes Feb 25, 2022

View reviewed changes

MVrachev added 4 commits February 28, 2022 14:42

Add "validation" arg in JSONSerializer

a17ceda

If the "validation" argument is set then when serializing the metadata object will be validated. Signed-off-by: Martin Vrachev <[email protected]>

MVrachev force-pushed the validation-during-serialization branch from 9329a4e to 6ea5372 Compare February 28, 2022 12:42

jku merged commit a74f7a1 into theupdateframework:develop Feb 28, 2022

MVrachev deleted the validation-during-serialization branch March 1, 2022 15:32

lukpueh mentioned this pull request Nov 2, 2022

Sort roles dict by keys when serializing #2161

Closed


		self.assertEqual(md, md_2)

		def test_md_eq_special_signatures_tests(self) -> None:

Add a "validate" argument option to JSONSerializer #1775

Add a "validate" argument option to JSONSerializer #1775

Uh oh!

Conversation

MVrachev commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 1841787060

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

jku commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukpueh commented Jan 18, 2022

Uh oh!

Uh oh!

MVrachev commented Jan 18, 2022

Uh oh!

lukpueh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukpueh commented Jan 20, 2022

Uh oh!

jku left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MVrachev commented Jan 26, 2022

Uh oh!

MVrachev commented Feb 7, 2022

Uh oh!

MVrachev commented Feb 7, 2022

Uh oh!

lukpueh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MVrachev Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukpueh Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MVrachev commented Feb 8, 2022

Uh oh!

MVrachev commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MVrachev commented Jan 17, 2022 •

edited

Loading

coveralls commented Jan 17, 2022 •

edited

Loading

jku commented Jan 18, 2022 •

edited

Loading

MVrachev Feb 8, 2022 •

edited

Loading

lukpueh Feb 8, 2022 •

edited

Loading

MVrachev commented Feb 14, 2022 •

edited

Loading

MVrachev Feb 17, 2022 •

edited

Loading

MVrachev commented Feb 17, 2022 •

edited

Loading

MVrachev commented Feb 28, 2022 •

edited

Loading