-
Notifications
You must be signed in to change notification settings - Fork 7.1k
fix HttpResource.resolve() with preprocessing #5669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -31,19 +32,16 @@ def __init__( | |||
*, | |||
file_name: str, | |||
sha256: Optional[str] = None, | |||
decompress: bool = False, | |||
extract: bool = False, | |||
preprocess: Optional[Union[Literal["decompress", "extract"], Callable[[pathlib.Path], pathlib.Path]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also go for an enum here, but not sure if that would be overkill.
💊 CI failures summary and remediationsAs of commit 8f07a4e (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 3 ongoing upstream failures:These were probably caused by upstream breakages that are not fixed yet.
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @pmeier
If i remember correctly, the way redirections are done is fairly hacky. Would it be worth thinking of a roboust way of doing these?
Yes, but not now. The thing that doesn't work is redirecting of mirrors and we only have a single dataset with mirrors. In general nothing has changed from the last time we evaluated this. I think we should wait for your investigations on how we want to load data from remote sources, e.g. do we use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock but if we're going to revisit everything anyway, perhaps there are more minimal changes that would work too (and that would be easier to review)? Not sure how tied the proposed changes are to the actual fix here.
…er/vision into datasets/resolve-preprocess
IMO, the changes are fairly minimal. As stated in my top comment, have a look at #5667 for an alternative solution. The gist from there is that we internally track whether |
Summary: * fix HttpResource.resolve() with preprocess set * fix README * add safe guard for invalid str inputs (Note: this ignores all push blocking failures!) Reviewed By: datumbox Differential Revision: D35216797 fbshipit-source-id: c0c2fee98d5a7ade1b6870b11f396632539eb994
#5282 removed the
preprocess
flag fromOnlineResource
. In #5584 and #5667 it was discovered that this broke the use case if eitherdecompress
orextract
is set on aHttpResource
and the URL redirects. @YosuaMichael provided a solution in #5667, but I think it will be much better to have a more general solution.This PR removes the
decompress
andextract
parameters and re-addspreprocess
. It can be anyCallable[[pathlib.Path], pathlib.Path]
. To keep the old convenience, one can also passpreprocess="decompress"
orpreprocess="extract"
.