Port SBU dataset #5683

lezwon · 2022-03-26T02:33:10Z

fixes #5349

facebook-github-bot · 2022-03-26T02:33:16Z

💊 CI failures summary and remediations

As of commit 91bd94f (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

binary_linux_wheel_py3.7_rocm4.3.1 (1/1)

Step: "packaging/build_wheel.sh" (full log | diagnosis details | 🔁 rerun)

https://repo.ius.io/7/x86_64/repodata/repomd.xm...- "Peer reports it experienced an internal error."


     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=ius.skip_if_unavailable=true

failure: repodata/repomd.xml from ius: [Errno 256] No more mirrors to try.
https://repo.ius.io/7/x86_64/repodata/repomd.xml: [Errno 14] curl#35 - "Peer reports it experienced an internal error."


Exited with code exit status 1

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

lezwon · 2022-03-26T03:48:20Z

@pmeier mind checking this PR and letting me know if I'm headed in the right direction?

pmeier

Argh, I knew there was a reason I marked SBU in the tracker issue. When I gave you the go I just saw that there is something to download and assumed I was mistaken before. That is on me, my bad.

In general your solution looks really good and works, but has one major downside: We need to re-download every image for every iteration. While we plan to support streaming datasets from the internet, we are not there yet. Thus, we need to download everything.

That could be achieved with a OnDiskCacheHolder, but that would mean we would only download at runtime. All current datasets download everything upfront and I would keep it that way for now.

My solution is to put a custom preprocess method onto the resource and download everything there.

torchvision/prototype/datasets/_builtin/sbu.py

pmeier

Looking good, thanks @lezwon! I have one simplification comment and one larger change for the mock data generation.

torchvision/prototype/datasets/_builtin/sbu.py

pmeier · 2022-03-30T12:13:00Z

test/builtin_dataset_mocks.py

+    with open(dataset_folder.joinpath(photo_urls_file), "w") as url_file, open(
+        dataset_folder.joinpath(photo_captions_file), "w"
+    ) as caption_file:
+        urls = [f"https://via.placeholder.com/{random.randint(100, 1000)}.jpg" for _ in range(num_samples)]


This is a really cool idea and I'm definitely going to use this webiste for other things in the future 🚀 Unfortunately, we cannot have an actual download during mock data generation for two reasons:

Downloading these images takes quite some time and we want the tests to be fast.

Meta internal test system do not have access to the internet and thus would fail here.

I propose I send a patch for the test suite that allows us to also only generate the already preprocessed files. Thus, we only add a SBUCaptionedPhotoDataset that already includes test images. I'll ping you on the PR.

@pmeier I'll wait for the PR to get merged right? I can make the necessary changes after it.

Yes, sorry for the delay. I'll try to get it merged soon.

Co-authored-by: Philip Meier <[email protected]>

torchvision/prototype/datasets/_builtin/sbu.py

Co-authored-by: Nicolas Hug <[email protected]>

facebook-github-bot added the cla signed label Mar 26, 2022

pmeier self-requested a review March 27, 2022 17:31

pmeier reviewed Mar 28, 2022

View reviewed changes

pmeier added module: datasets prototype labels Mar 28, 2022

lezwon marked this pull request as draft March 29, 2022 02:07

lezwon marked this pull request as ready for review March 29, 2022 07:44

pmeier self-requested a review March 29, 2022 08:13

lezwon changed the title ~~[WIP] Port SBU dataset~~ Port SBU dataset Mar 29, 2022

pmeier requested changes Mar 30, 2022

View reviewed changes

pmeier mentioned this pull request Mar 30, 2022

allow preprocessed mock data in prototype datasets tests #5706

Open

lezwon and others added 11 commits March 31, 2022 11:53

added sbu dataset to prototype

591a917

handle missing files

cc40b08

create sbu dataset

0bf01b3

added tests

3479fff

Apply suggestions from code review

aab8dbb

Co-authored-by: Philip Meier <[email protected]>

handle missing images

5729c3f

fixed type check

36c3af2

fixed type check in args

c04b56c

fix failing unit test

c1518ab

Apply suggestions from code review

dae1727

Co-authored-by: Philip Meier <[email protected]>

imported warnings

91bd94f

lezwon force-pushed the 5349_sbu_dataset branch from 34b9775 to 91bd94f Compare March 31, 2022 06:35

NicolasHug reviewed Apr 4, 2022

View reviewed changes

torchvision/prototype/datasets/_builtin/sbu.py Outdated Show resolved Hide resolved

Apply suggestions from code review

083b3ee

Co-authored-by: Nicolas Hug <[email protected]>

This was referenced Apr 7, 2022

Refactor prototype datasets #5778

Merged

fix deprecation warning for datapipes meta-pytorch/data#324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port SBU dataset #5683

Port SBU dataset #5683

Uh oh!

lezwon commented Mar 26, 2022

Uh oh!

facebook-github-bot commented Mar 26, 2022 •

edited

Loading

Uh oh!

lezwon commented Mar 26, 2022

Uh oh!

pmeier left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmeier left a comment

Uh oh!

Uh oh!

pmeier Mar 30, 2022 •

edited

Loading

Uh oh!

pmeier Mar 31, 2022

Uh oh!

lezwon Apr 1, 2022

Uh oh!

pmeier Apr 1, 2022

Uh oh!

Uh oh!

Uh oh!

Port SBU dataset #5683

Are you sure you want to change the base?

Port SBU dataset #5683

Uh oh!

Conversation

lezwon commented Mar 26, 2022

Uh oh!

facebook-github-bot commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

binary_linux_wheel_py3.7_rocm4.3.1 (1/1)

Uh oh!

lezwon commented Mar 26, 2022

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

lezwon Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Mar 26, 2022 •

edited

Loading

pmeier Mar 30, 2022 •

edited

Loading