Skip to content

get-url: allow weak etag #4372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 13, 2020
Merged

get-url: allow weak etag #4372

merged 1 commit into from
Aug 13, 2020

Conversation

pared
Copy link
Contributor

@pared pared commented Aug 10, 2020

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Related to #4131

Comment on lines -139 to -144
if etag.startswith("W/"):
raise DvcException(
"Weak ETags are not supported."
" (Etag: '{etag}', URL: '{url}')".format(etag=etag, url=url)
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared, what about imports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thats right, I changed the approach.

@pared pared changed the title tree: http: dont raise on weak etag [WIP] get-url: allow weak e-tag Aug 11, 2020
@pared pared changed the title [WIP] get-url: allow weak e-tag [WIP] get-url: allow weak etag Aug 11, 2020
@pared pared changed the title [WIP] get-url: allow weak etag get-url: allow weak etag Aug 11, 2020
@@ -16,6 +16,5 @@ def get_url(url, out=None):

(dep,) = dependency.loads_from(None, [url])
(out,) = output.loads_from(None, [out], use_cache=False)
dep.save()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that upon get-url there is no point in saving the dependency. We do not have any operations related to objects that are obtained using this method. Saving dependency makes sense in case of import, when later we reuse the information, for example to update the asset.

Copy link
Collaborator

@skshetry skshetry Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixes #4251

EDIT: added to the description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes us handle missing dependencies in a unified way though. Without it the user will get something random from the internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop, you mean, for example what happens upon downloading non-existient asset? Yeah that's true, but isn't get-url convenience method so that user does not have to work with other tools like, wget, curl? Treating it as normal dependency can limit its functionality, like in #4131. If we want to unify the handling of errors, maybe we should check if dependency exist and throw from inside the get_url method?

Copy link
Contributor

@efiop efiop Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared That would be a yet another place for duplicating the logic. Currently, we have unified save() and we have ununified tree exceptions, that might vary from FileNotFound for local to something completely odd in gdrive + some unexpected behaviours. In any case, this is clearly out of scope for this weak-etag PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, potentially unified tree exceptions is all we need, but I'm not 100% sure in this particular case. The fact right now is that download() will raise random exceptions, which are confusing. So using save() for now just provides a familiar import-like and -d like errors, which is good enough for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared I wonder if we can simply allow weak etags for import-url and run -d as well. We clearly can't use weak etags for caching, but it is good enough for dependencies. Or am I missing something?

Copy link
Contributor Author

@pared pared Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accoding to w3

Strong validators are usable in any context. Weak validators are only usable in contexts 
that do not depend on exact equality of an entity. For example, either kind is usable for
a conditional GET of a full entity. However, only a strong validator is usable for a sub-range
retrieval, since otherwise the client might end up with an internally inconsistent entity.

In case of get-url and import-url we are relying on full entity, and not byte to byte response of server. It seems to me that we can use weak etags.

EDIT:
It seems to me that we can remove the weak check.
As to save/check of the dependency:
I would leave check as it is in this change for now, and try to solve #4251 as standalone task.
On one hand, if we remove the check, we get ambiguous errors, depending on URL we provide.
On the other hand, if we leave this check, we can also get strange behavior, for example another issue from #4131,
where server responds 403 on HEAD and 200 on GET which causes us to throw, even though the asset can be GET... None of this is part of etag problem. So let's handle weak etags here, and handle #4251 separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared So if we can use weak etags, could we just remove that check in this PR? The check() changes work, but they won't work for import-url, so we will get different errors, which is not very nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolled back to allowing weak etag, #4251 is no longer fixed by this issue. cc @skshetry

@pared pared requested a review from skshetry August 11, 2020 12:00
@pared pared requested a review from efiop August 13, 2020 08:17
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@efiop efiop merged commit 9c772f5 into iterative:master Aug 13, 2020
@jorgeorpinel
Copy link
Contributor

I unmarked the docs check box since "strong etag" is mentioned in:

Seems like these (and possibly other) docs need an update per this?

@pared
Copy link
Contributor Author

pared commented Aug 27, 2020

@jorgeorpinel that is a good point, sorry for omitting that, I will create issue and assign myself.

@jorgeorpinel
Copy link
Contributor

Np, just checking πŸ™‚

@pared pared deleted the 4131_weak_etag branch January 4, 2022 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants