-
Notifications
You must be signed in to change notification settings - Fork 278
Abstract out the network IO #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The new GitHub actions have been harsh on me :)) Will review the errors after the holiday break. |
Fixed the linter issues, however according to Windows "Download was too slow." 🙄 |
Fixed a small issue and force-pushed but (un)fortunately I see no errors while testing on a local windows VM. |
First thing, apologize for the spam and using the CI as a test server but I could not reproduce the Windows tests failure locally. The problem can be summarized in two parts:
While 1. can quickly be fixed as suggested by @jku in #1257, 2. is related to the network abstraction api which may need some discussion. The main point is, given all the uncertainty around protection against slow retrieval attack and its recent removal from the specification, where should we implement the slow retrieval check:
My choice is in favour of point 3 but I would like to hear more opinions before going ahead. |
Thanks for the detailed comment, @sechkova (and of course for the patch, which I haven't looked at yet). Here are my thoughts:
|
I agree with @lukpueh: (1) makes more sense to me, depending on the philosophical motivation for the fetcher. Are we simply trying to fetch bytes or do something more? |
I lean towards option 3 (mostly because I do not believe our slow retrieval protection works without false positives -- Teodoras test failures are a strange example of this IMO -- and I would like to not have these false positives in pip)
I originally suggested the same thing, as this might also help with the massive bias that the current implementation has. Example of the bias: Imagine a download of 10 chunks where 9 chunks transfer at "normal internet speeds" but one is slow:
However, solving the above issue issue properly is no doubt complex -- probably the best we could do is special case the first chunk handling, just giving it a few extra seconds... I don't see why this is better than letting the fetcher handle it. |
Oh additional reason for choice 3 (that Teodora kind of mentioned): the patch will probably be cleaner as it will be easier to just move existing code from download.py to Fetcher instead of developing similar but not quite same code. |
The issue is that the first call to the fetch() generator also creates the connection because there is no separate initialization function... now that I said it aloud, I think it has to be possible to initialize the generator separately. I'll look this up. Thanks. |
I am in favour of option three, both because we don't have a specification for slow retrieval attacks, and because we are not confident about the effectiveness of the current implementation of slow retrieval attacks.
I didn't receive any spam about CI failures, so I think no need to apologise here 😄 |
ebef202
to
e8be76b
Compare
Thanks all for giving your opinion and especially @jku for the ideas, most of which I think are included in the latest changes. |
tuf/fetcher.py
Outdated
|
||
# Check response status. | ||
response.raise_for_status() | ||
response = session.get(url, stream=True, timeout=self.SOCKET_TIMEOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there's no longer a context manager in use for response
, do we need to call some cleanup method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe double check what happens when connect times out. I expect that's a ReadTimeoutError which should be handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some cleanup in 77900af.
Connect time out will raise an exception that will be captured in NoWorkingMirrorError
. This is the same as the current behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it except for the fact that it rewrites download.py completely :) I understand it's mostly an indent change but I was really hoping that wasn't needed.
I have not yet reviewed download.py changes -- I will do that but I'll leave an early review for this question: What compromises would we need to make to keep download.py changes to a minimum?
tuf/fetcher.py
Outdated
|
||
|
||
|
||
class FetcherSettings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the construct? I can see it should work but what is the advantage of a separate FetcherSettings and RequestsFetcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to separate FetcherSettings
so that other FetcherInterface
implementations can reuse them (by inheriting) or replace them with their own. But given that we cannot remove them from tuf.settings
just like that, I dropped this change too.
tuf/settings.py
Outdated
# Set a timeout value in seconds (float) for non-blocking socket operations. | ||
SOCKET_TIMEOUT = 4 #seconds | ||
|
||
# The maximum chunk of data, in bytes, we would download in every round. | ||
CHUNK_SIZE = 400000 #bytes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are settings that unfortunately maybe used by someone so I don't think we should just silently delete them without deprecation.
I wonder if the new implementation should still use these as the default values for its properties... and possibly mark these as deprecated, explaining that these values are now fetcher specific and that the default RequestsFetcher exposes the same properties (but that those properties might not be considered API(?) and those changing them should consider making their own fetcher)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally forgot about settings
being part of the API. I dropped the commit and just used tuf.settings
in RequestsFetcher
.
I suggest we revise tuf.settings
in general and then apply the decisions here.
tuf/client/updater.py
Outdated
fetcher: | ||
A conctrete 'FetcherInterface' implementation. If not provided, | ||
tuf.fetcher.RequestsFetcher is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo on concrete. Also it's lacking a sentence on what fetcher does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b0e8acb
a749a0d
to
abac87f
Compare
Reworked some changes after discussion with @jku. Download.py should be more appealing for a review now! |
tests/test_fetcher.py
Outdated
self.temp_file.seek(0) | ||
temp_file_data = self.temp_file.read().decode('utf-8') | ||
self.assertEqual(self.file_contents, temp_file_data) | ||
self.assertEqual(self.file_length, len(temp_file_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to check that when you know that the self.file_contents
is equal to temp_file_data
from the previous assertEqual
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Thanks for noting, seems like I'll have to take another look on the test cases.
tests/test_fetcher.py
Outdated
|
||
# Test incorrect URL parsing | ||
def test_url_parsing(self): | ||
with self.assertRaises(tuf.exceptions.URLParsingError) as cm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly will happen if this test fails? What about when it succeeds?
I just haven't seen the usage of a context manager with an exception...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with self.assertRaises(ExpectedError):
function_under_test(argument1, argument2)
is functionally the same as
self.assertRaises(ExpectedError, function_under_test, argument1, argument2)
but more readable especially with longer names and more arguments, and also more extensible (as you can have more than one line in the block as test_url_parsing() does).
The as cm
seems unneeded (but harmless) in this case as it's not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but more readable especially with longer names and more arguments, and also more extensible (as you can have more than one line in the block as test_url_parsing() does).
This is what I've meant when writing it but I guess I had plans to use cm
that didn't happen ...
Would it be possible to ensure the new files (tuf/client/fetcher.py and tuf/requests_fetcher.py) adhere to the new coding style? Of particular note is the docstring format. |
The docstrings are updated. I haven't applied the rules from the new coding style which cause linter errors. Since the changes in the PR are meant to work with the existing code they follow the old style. Let's review the coding style during the transition to TUF 1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, which you may choose to address or not, and a couple of questions. I will be happy to merge this in its current state once we've released v0.17.0 (#1284).
Thank you for the solid work on this!
tests/test_fetcher.py
Outdated
temp_file_data = self.temp_file.read().decode('utf-8') | ||
self.assertEqual(self.file_contents, temp_file_data) | ||
# Double check that we download more than one chunk | ||
self.assertGreater(chunks_count, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is unnecessary? We are checking for the expected number of chunks below, what additional value does this provide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked it a bit.
Does this line express my intention better:
https://github.com/theupdateframework/tuf/blob/c794e0b917ba79327277f2846eda0c9a005b6730/tests/test_fetcher.py#L122-L123
The goal is to make sure that the calculation of expected_chunks
is correct and results in more than one chunk downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had confused variables. Apologies. 🤦
The locality of the check makes the intention clearer, I don't think we should be using assert()
over self.assert*
here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did change the variable to make the test clearer 🙄
Here it is, using the unittest's "assert" 50b3b19
The new class FetcherInterface defines an interface for abstract network download which can be implemented for a variety of network libraries and configurations. Signed-off-by: Teodora Sechkova <[email protected]>
A concrete implementation of FetcherInterface based on the Requests library. Signed-off-by: Teodora Sechkova <[email protected]>
Abstract the network IO. Move the network operations from tuf.download to the RequestsFercher class which is TUF's implementation of the abstract FetcherInterface. Signed-off-by: Teodora Sechkova <[email protected]>
Initialize Updater with an external implementation of FetcherInterface. If not provided, tuf.fetcher.RequestsFetcher is used. Signed-off-by: Teodora Sechkova <[email protected]>
Pass RequestsFetcher object to tuf.download functions. Signed-off-by: Teodora Sechkova <[email protected]>
Add unit test for requests_fetcher.py Signed-off-by: Teodora Sechkova <[email protected]>
- Update RequestsFetcher.fetch to return a generator object. - Update _download_file to skip connection time when calculating average download speed. - Write chunk to temp file before exiting the fetcher loop on error. Signed-off-by: Teodora Sechkova <[email protected]>
On Windows (Github Actions) the lookup for 'localhost' takes 1 second. This is because: - Windows retries connect() with a timeout - the machine has IPv6 and IPv4 but Testserver only binds the port on IPv4 - the test clients connect to 'localhost' Since socketserver.TCPServer does not seem to support IPv6 before 3.8, just replace 'localhost' with '127.0.0.1' in client-side URLs. See theupdateframework#1257 Signed-off-by: Teodora Sechkova <[email protected]>
Use '>=' as the defensive version of the equality check. Add a comment describing the need of a chunks() generator. Signed-off-by: Teodora Sechkova <[email protected]>
Addressed the latest comments and rebased on the latest changes from the development branch. |
Add test cases to test_fetcher and test_download that decrease default chunk size and download data in more than one chunk. Small code-style improvements. Signed-off-by: Teodora Sechkova <[email protected]>
Calls to safe_download and unsafe_download leave a temporary file unclosed. Signed-off-by: Teodora Sechkova <[email protected]>
A custom error is required so that updater is able to special case 403 & 404 status codes. Rewrite the test case a bit to be more readable. Signed-off-by: Jussi Kukkonen <[email protected]>
* Move FetcherInterface to tuf/client/ directory: This way everything inside that directory is clearly part of client API, and everything outside _may_ be more of an implementation detail (settings is still an unfortunate exception) * Keep RequestsFetcher in tuf/ for same reasons: it's just the default implementation, not explicitly part of client API An even clearer division would be if we moved all the client specific implementation details (download.py, mirrors.py, requests_fetcher.py) to tuf/client/_internal/ but that's a larger change... Signed-off-by: Jussi Kukkonen <[email protected]>
Use a common test level constant for defining the host address forming the download URL on the client side. Signed-off-by: Teodora Sechkova <[email protected]>
Ensure that the newly added files' docstrings adhere to the recently adopted code style guideline (theupdateframework#1232). Small code style improvements in comments and imports. Signed-off-by: Teodora Sechkova <[email protected]>
Had another look, I agree with my previous self: LGTM. Notes:
There's an approval already so I'll merge this soon if no complaints appear. |
Thanks for all the efforts you put into this PR, @sechkova and @jku! And kudos to the reviewers, @joshuagl and @MVrachev. ❤️ I won't add any meaningful comments here, but based on a quick glance through the diff and all the other comments and discussions above, I'm sure the approval is well justified. :) |
Description of the changes being introduced by the pull request:
Based on the discussion in #1213 and mainly the idea of @jku, implement an abstract
FetcherInterface
calss anda concrete
RequestsFetcher
class .The idea is to extract the network IO related code out of
tuf.download
and put it inRequestsFetcher
class which is also the defaultFetcherInterface
implementation which TUF uses if not given an external one.RequestsFetcher
implements only one public method 'fetch
' which returns a bytes iterator. All file handling and verificationlogic remains in
tuf.download
which is now wrapper in aFileDownload
class for convenience.Please verify and check that the pull request fulfills the following
requirements: