Skip to content

cache dependencies in prototype tests on a daily basis #5929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented May 2, 2022

Addresses #5914 (comment). Timings:

OS w/o cache w/ cache
Linux 0m 45s 0m 32s
Windows 2m 16s 2m 4s
macOS 1m 8s 0m 48s

There are some gains, but they are quite insignificant compared to the overall runtime of the workflow. This is due to 2 factors:

  • The largest download happens for PyTorch core. However, the download is blazingly fast. AFAIK Github Actions are run in AWS and since the wheels are already stored in S3 this makes sense.
  • The installation time of the wheels dwarfs the download time. Since this PR currently only caches the downloaded files, we cannot change that.

I'll try to cache the environment instead and report the timings again.

@pmeier pmeier changed the title Cache prototype tests cache third-party dependencies in prototype tests May 2, 2022
@pmeier
Copy link
Collaborator Author

pmeier commented May 3, 2022

By caching the environment, we now get these results:

OS w/o cache w/ cache
Linux 0m 54s 0m 10s
Windows 2m 30s 1m 3s
macOS 1m 27s 0m 31s

Now we get a significant relative decrease, but in relation to the total runtime of the workflow, it is still not much 🤷

Given that we can write the cache key generation in Python, IMO it is simple enough to keep. No strong opinion though.

@@ -1,3303 +0,0 @@
version: 2.1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will re-instate this file when the PR is otherwise good to go. With this deleted, I can debug GitHub Actions CI without triggering a full CircleCI build with every commit.


today = datetime.datetime.utcnow()
yesterday = today - datetime.timedelta(1)
cache_date = f"{today if today.hour > 10 else yesterday:%Y%m%d}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure we cache the current nightly builds correctly. We will have a "constant" env every day starting from UTC+0 11a:00.

Copy link
Collaborator Author

@pmeier pmeier May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, torchdata only starts their nightly workflows at UTC+0 11:00. Thus, we should only cache after they are done:

Suggested change
cache_date = f"{today if today.hour > 10 else yesterday:%Y%m%d}"
cache_date = f"{today if today.hour >= 12 else yesterday:%Y%m%d}"

- name: Run prototype tests
shell: bash
run: pytest --durations=20 test/test_prototype_*.py
# - name: Run prototype tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reinstate this if the PR is otherwise good to go.

@pmeier pmeier requested review from datumbox and vfdev-5 May 3, 2022 12:31
@pmeier pmeier marked this pull request as ready for review May 3, 2022 12:31
@pmeier pmeier changed the title cache third-party dependencies in prototype tests cache dependencies in prototype tests on a daily basis May 12, 2022
@datumbox datumbox removed their request for review May 12, 2022 09:48
@datumbox
Copy link
Contributor

No strong opinions on my side. Tiny time savings and small number of lines introduced.

@NicolasHug do you have opinion?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 12, 2022

@pmeier following your commits and CI times

w/o cache2e27808 -> GHA: https://github.com/pytorch/vision/runs/6272567930

w/ cache
f516b21 -> GHA: https://github.com/pytorch/vision/runs/6272688394

I see a bit different times:

Without cache:

Ubuntu: 8m 1s (https://github.com/pytorch/vision/runs/6272567930)

  • Install PTH nightly: 15s
  • Install TV: 6m 13s
  • Install opt deps: 33s
  • Install test reqs: 3s

Windows: 8m 28s (https://github.com/pytorch/vision/runs/6272567996?check_suite_focus=true)

  • Install PTH nightly: 1m 16s
  • Install TV: 3m 29s
  • Install opt deps: 46s
  • Install test reqs: 5s

MacOSX: 9m 49s (https://github.com/pytorch/vision/runs/6272568056?check_suite_focus=true)

  • Install PTH nightly: 33s
  • Install TV: 7m 43s
  • Install opt deps: 45s
  • Install test reqs: 3s

With cache:

Ubuntu: 4m 24s (https://github.com/pytorch/vision/runs/6272688394?check_suite_focus=true)

  • Install PTH nightly: 0s
  • Install TV: 4m 7s
  • Install opt deps: 1s
  • Install test reqs: 1s

Windows: 4m 6s (https://github.com/pytorch/vision/runs/6272688479?check_suite_focus=true)

  • Install PTH nightly: 2s
  • Install TV: 2m 38s
  • Install opt deps: 2s
  • Install test reqs: 2s

MacOSX: 8m 0s (https://github.com/pytorch/vision/runs/6272688565?check_suite_focus=true)

  • Install PTH nightly: 2s
  • Install TV: 7m 17s
  • Install opt deps: 3s
  • Install test reqs: 2s

Am I misinterpreting something ?

@pmeier
Copy link
Collaborator Author

pmeier commented May 12, 2022

Am I misinterpreting something ?

  1. For the runs with cache, you also need to add the time for restoring the cache.
  2. I did not include the build of torchvision, since it fluctuates quite a bit during runs. Still, there are some dependencies installed there, so the speedup is a little higher than what I reported.

@NicolasHug
Copy link
Member

@NicolasHug do you have opinion?

My 2 cent is that even the tiniest change can backfire and cause problems. CI and dependencies are very touchy areas, so in general I prefer not to change anything unless it's really worth it. Here there are some gains but as you noted they're not incredibly high, and the prototype test job is far from being a bottleneck on our CI anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants