-
Notifications
You must be signed in to change notification settings - Fork 1.2k
get-url: allow weak etag #4372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get-url: allow weak etag #4372
Conversation
if etag.startswith("W/"): | ||
raise DvcException( | ||
"Weak ETags are not supported." | ||
" (Etag: '{etag}', URL: '{url}')".format(etag=etag, url=url) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared, what about imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thats right, I changed the approach.
dvc/repo/get_url.py
Outdated
@@ -16,6 +16,5 @@ def get_url(url, out=None): | |||
|
|||
(dep,) = dependency.loads_from(None, [url]) | |||
(out,) = output.loads_from(None, [out], use_cache=False) | |||
dep.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that upon get-url
there is no point in saving the dependency. We do not have any operations related to objects that are obtained using this method. Saving dependency makes sense in case of import
, when later we reuse the information, for example to update the asset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fixes #4251
EDIT: added to the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes us handle missing dependencies in a unified way though. Without it the user will get something random from the internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop, you mean, for example what happens upon downloading non-existient asset? Yeah that's true, but isn't get-url
convenience method so that user does not have to work with other tools like, wget
, curl
? Treating it as normal dependency can limit its functionality, like in #4131. If we want to unify the handling of errors, maybe we should check if dependency exist and throw from inside the get_url
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared That would be a yet another place for duplicating the logic. Currently, we have unified save() and we have ununified tree exceptions, that might vary from FileNotFound for local to something completely odd in gdrive + some unexpected behaviours. In any case, this is clearly out of scope for this weak-etag PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, potentially unified tree exceptions is all we need, but I'm not 100% sure in this particular case. The fact right now is that download()
will raise random exceptions, which are confusing. So using save()
for now just provides a familiar import-like and -d
like errors, which is good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared I wonder if we can simply allow weak etags for import-url and run -d
as well. We clearly can't use weak etags for caching, but it is good enough for dependencies. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accoding to w3
Strong validators are usable in any context. Weak validators are only usable in contexts
that do not depend on exact equality of an entity. For example, either kind is usable for
a conditional GET of a full entity. However, only a strong validator is usable for a sub-range
retrieval, since otherwise the client might end up with an internally inconsistent entity.
In case of get-url
and import-url
we are relying on full entity, and not byte to byte response of server. It seems to me that we can use weak etags.
EDIT:
It seems to me that we can remove the weak
check.
As to save
/check
of the dependency:
I would leave check
as it is in this change for now, and try to solve #4251 as standalone task.
On one hand, if we remove the check
, we get ambiguous errors, depending on URL we provide.
On the other hand, if we leave this check, we can also get strange behavior, for example another issue from #4131,
where server responds 403 on HEAD
and 200 on GET which causes us to throw, even though the asset can be GET
... None of this is part of etag
problem. So let's handle weak etags here, and handle #4251 separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared So if we can use weak etags, could we just remove that check in this PR? The check()
changes work, but they won't work for import-url
, so we will get different errors, which is not very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I unmarked the docs check box since "strong etag" is mentioned in:
Seems like these (and possibly other) docs need an update per this? |
@jorgeorpinel that is a good point, sorry for omitting that, I will create issue and assign myself. |
Np, just checking π |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Related to #4131