Skip to content

Fix handling consistent targets same as legacy updater #1591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 12, 2021

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented Sep 24, 2021

Note: This pr is dependent on pr #1592.

Fixes #1576

Description of the changes being introduced by the pull request:

The definition of consistent targets in the spec is ambiguous:
"consistent target files should be written to non-volatile storage
as digest.filename.ext"
Additionally, the specification describes consistent targets when the
client builds the download URL as follows:
"The filename is of the form HASH.FILENAME.EXT".
The issue is about how we interpreted those quotes.
The legacy updater has decided this means a target path a/b will translate to a download url path a/{HASH}.b.
The ngclient however translates the target path a/b to a download URL path {HASH}.a/b.

We decided we want to follow the same approach taken from the legacy
updater and thus change how we construct the consistent targets.
Additionally, we want to make sure we test for cases when the TARGETPATH
is an empty string or points to a directory.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@MVrachev MVrachev changed the title Consistent targets Fix handling consistent targets same as legacy updater Sep 24, 2021
@coveralls
Copy link

coveralls commented Sep 24, 2021

Pull Request Test Coverage Report for Build 1290255166

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+1.0%) to 98.251%

Files with Coverage Reduction New Missed Lines %
tuf/ngclient/updater.py 4 94.23%
Totals Coverage Status
Change from base Build 1290053983: 1.0%
Covered Lines: 3748
Relevant Lines: 3781

💛 - Coveralls

@MVrachev
Copy link
Collaborator Author

While working on this one multiple questions appeared:

  1. If a targetpath contains directories, then do we want to create the missing directories?
    There is a separate issue about this and I left a comment there: ngclient: downloading targets fails if targetpath includes subdirectories #1571 (comment)
  2. How can we create better error messages from the asserts in the for cycles in the unit tests?
    Once one of the asserts failed and I wasn't sure which case failed.
    Do we want to use decorators as we did in test_metadata_serialization.py?
  3. Are there other cases I should cover for TARGETPATH inside the for loops?
  4. Do we want to validate the destination directory in download_target?
    I think we do need to do that. Are the added checks sufficient?
  5. Don't we want to filter the target_path input for get_one_valid_targetinfo as well?
  6. Are there other file write errors that we should document in the Raises for download_target
    or we can delete one of the TODO items in the Raises section?

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the errors you raise both look like things we could handle earlier: the target path originally comes from the client so if we're validating, let's validate where it really is input -- get_one_valid_targetinfo.

The commit also uses os.path to form a URL. It may be possible that Python has enough workarounds to make that work in practice in Windows but I don't like it. os.path is for local file paths

@jku
Copy link
Member

jku commented Sep 27, 2021

While working on this one multiple questions appeared:

  1. If a targetpath contains directories, then do we want to create the missing directories?
    There is a separate issue about this and I left a comment there: ngclient: downloading targets fails if targetpath includes subdirectories #1571 (comment)

I'll try to have a PR for that ASAP now that the blocker is merged. But feel free to include the directory creating commit until then

  1. How can we create better error messages from the asserts in the for cycles in the unit tests?
    Once one of the asserts failed and I wasn't sure which case failed.
    Do we want to use decorators as we did in test_metadata_serialization.py?

Yeah, that's one of the core advantages of proper table testing, we probably do.

  1. Do we want to validate the destination directory in download_target?
    I think we do need to do that. Are the added checks sufficient?

I think we maybe don't need to validate that it exists or can be written to -- I think we can just mention OSError in the docstring. But should test that something reasonable really happens on those cases

I already mentioned the target path checks (if we do them) should IMO happen much earlier.

  1. Don't we want to filter the target_path input for get_one_valid_targetinfo as well?

👍 new issue might be appropriate

@MVrachev
Copy link
Collaborator Author

  1. Don't we want to filter the target_path input for get_one_valid_targetinfo as well?

👍 new issue might be appropriate

Created an issue for that: #1594.

@MVrachev MVrachev force-pushed the consistent-targets branch 2 times, most recently from 2c78cdd to 7620478 Compare September 30, 2021 08:34
@MVrachev
Copy link
Collaborator Author

MVrachev commented Sep 30, 2021

This pr is ready for review after the merge of #1587 and #1592.

  1. How can we create better error messages from the asserts in the for cycles in the unit tests?
    Once one of the asserts failed and I wasn't sure which case failed.
    Do we want to use decorators as we did in test_metadata_serialization.py?

Yeah, that's one of the core advantages of proper table testing, we probably do.

I experimented with a new commit where I used the same decorator as test_metadata_serialization.py.
IMO it looks good and the error messages are a lot better.

  1. Do we want to validate the destination directory in download_target?
    I think we do need to do that. Are the added checks sufficient?

I think we maybe don't need to validate that it exists or can be written to -- I think we can just mention OSError in the docstring. But should test that something reasonable really happens on those cases

I already mentioned the target path checks (if we do them) should IMO happen much earlier.

I won't focus on validation in this pr.
This should be resolved in #1594.

The commit also uses os.path to form a URL. It may be possible that Python has enough workarounds to make that work in practice in Windows but I don't like it. os.path is for local file paths

I changed that and instead used a manual file path and URL creation.

@jku can you answer my last question too:

  1. Are there other file write errors that we should document in the Raises for download_target
    or we can delete one of the TODO items in the Raises section?

@MVrachev MVrachev marked this pull request as ready for review September 30, 2021 08:42
@MVrachev MVrachev requested a review from jku September 30, 2021 08:42
Copy link
Contributor

@sechkova sechkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good but I have some reservations for the tests that I put in the comments. Also there isn't really a test for the hash prefix implementation anywhere, the tests rely on the assumption that the default workflow uses consistent targets and no errors were raised.

@MVrachev MVrachev force-pushed the consistent-targets branch from 7620478 to c645353 Compare October 1, 2021 16:22
@MVrachev
Copy link
Collaborator Author

MVrachev commented Oct 2, 2021

Also there isn't really a test for the hash prefix implementation anywhere, the tests rely on the assumption that the default workflow uses consistent targets and no errors were raised.

I will think about how can I enhance the test and leave a comment when I am ready.

@jku
Copy link
Member

jku commented Oct 4, 2021

6. Are there other file write errors that we should document in the Raises for download_target
or we can delete one of the TODO items in the Raises section?

Sorry I don't understand the question? I don't see any write errors documented currently.

Can file writes fail? Yes. Does our code raise exceptions in those cases? probably: someone needs to read the code and find out

@jku
Copy link
Member

jku commented Oct 4, 2021

Also there isn't really a test for the hash prefix implementation anywhere, the tests rely on the assumption that the default workflow uses consistent targets and no errors were raised.

I think this might be fine: we definitely want the default setup to always be consistent_snapshot (test could assert that this is the case, or just add a comment that it's expected).

But I have been thinking that RepositorySimulator could easily have some tracking on what gets requested/downloaded... so a test could then assert that e.g. a specific url was used. This sounds like a potential improvement but maybe a sidetrack for this issue?

@MVrachev MVrachev force-pushed the consistent-targets branch from c645353 to 919eae4 Compare October 4, 2021 15:41
@MVrachev
Copy link
Collaborator Author

MVrachev commented Oct 4, 2021

Updated the pr with the following changes:

  • moved the test case for targetpath with directories in a separate function
  • moved the decorator in tests/utils.py and used it in both test_metadata_serialization.py and test_updater_with_simulator.py
  • added checks that consistent_snapshotandprefix_targets_with_hash` are used before downloading a target in the test and a check inside the repo_simulator.

@MVrachev MVrachev requested a review from sechkova October 4, 2021 15:45
@jku jku mentioned this pull request Oct 6, 2021
3 tasks
@MVrachev MVrachev force-pushed the consistent-targets branch 2 times, most recently from c0b93f5 to d42a199 Compare October 6, 2021 10:26
@MVrachev MVrachev requested a review from sechkova October 6, 2021 10:27
@MVrachev MVrachev force-pushed the consistent-targets branch from d42a199 to b51c099 Compare October 6, 2021 13:32
@MVrachev MVrachev requested a review from sechkova October 6, 2021 13:33
Copy link
Contributor

@sechkova sechkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok now.

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

Some observations

  • The targets testing is still lacking in volume: we should have vastly more variations on this... but let's do that on it's own issue, as result of Design test strategy #1579
  • I think all of the different DataSet types are probably unneeded now -- it was useful in the very crowded decorator implementation but now the different DataSets might confuse more than help 🤷

left two more comments but they are nits as well

@MVrachev
Copy link
Collaborator Author

MVrachev commented Oct 7, 2021

  • I think all of the different DataSet types are probably unneeded now -- it was useful in the very crowded decorator implementation but now the different DataSets might confuse more than help 🤷

Yea, maybe you are right, given that as you have pointed out in one of your comments I should create a TargetData instance if I want to be correct with the annotations.
Additionally, what is the content of the specific target data becomes clear when I unfold it into specific variables.
I will remove TargetData and the specific DataType altogether.

The definition of consistent targets in the spec is ambiguous:
"consistent target files should be written to non-volatile storage
as digest.filename.ext"
Additionally, the specification describes consistent targets when the
client builds the download URL as follows:
"The filename is of the form HASH.FILENAME.EXT".
The issue is about how we interpreted those quotes.
The legacy updater has decided this means a target path "a/b" will
translate to a download url path "a/{HASH}.b".
The ngclient however translates the target path "a/b" to a download url
path "{HASH}.a/b".

We decided we want to follow the same approach taken from the legacy
updater and thus change how we construct the consistent targets.
Additionally, we want to make sure we test for cases when the TARGETPATH
is an empty string or points to a directory.

Signed-off-by: Martin Vrachev <[email protected]>
Generalize the decorator used in test_metadata_serialization.py and
move it inside tests/utils.py, so it can be reused in other similar
situations.

Signed-off-by: Martin Vrachev <[email protected]>
Reuse the decorator defined in tests/utils.py in order
to receive more helpful messages when an assertion
fails in test_tragets().

Signed-off-by: Martin Vrachev <[email protected]>
Make sure that hash prefixes are added when downloading a target
through the repository simulator.

Signed-off-by: Martin Vrachev <[email protected]>
@MVrachev MVrachev force-pushed the consistent-targets branch from b51c099 to c3e746a Compare October 7, 2021 12:32
@MVrachev MVrachev requested review from sechkova and jku October 7, 2021 12:33
@sechkova sechkova merged commit 88245f1 into theupdateframework:develop Oct 12, 2021
@MVrachev MVrachev deleted the consistent-targets branch October 13, 2021 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ngclient: consistent targets handled differently than legacy updater
4 participants