-
Notifications
You must be signed in to change notification settings - Fork 7.1k
ci: Limit scope of unittest to one python version #5479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Limits scope of unittesting to one python version for both macOS and Windows. These types of workflows are particularly expensive and take a long time so running them on every PR / every push is a bit wasteful considering the value in signal between different python versions is probably negligible. Signed-off-by: Eli Uriegas <[email protected]>
💊 CI failures summary and remediationsAs of commit bc12c36 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seemethere Thanks for the PR.
Though I agree that cost saving is very important, the proposed changes seem very restrictive. For example we just lost the windows GPU unit-tests, was this accidental? Moreover we lost the ability to check for issues caused by dependencies for specific OS and Python version combinations. Such issues are extremely common at the moment (see for example this from yesterday: #5451).
I know that your team has on their roadmap a project to standardize and harden the CI infra cross domains. I'm certain that this will help us reduce the breakages. Shall we review the unit-test jobs once this is completed? Happy to chat offline and summarize here later if that's easier. Thanks!
Losing windows GPU unittests is unintentional, no idea how this PR actually led to that. As for identifying regressions between OS and specific python versions, we've found that most of those can be covered by the binary tests and don't need to be done in jobs that can run for 2+ hours (or at least those were the numbers I observed for torchaudio). As well, for pytorch core we don't typically run unittests on all versions of python specifically for the fact that we don't typically find a difference in signal between different versions of python. I'll see about re-adding back the windows GPU tests |
Signed-off-by: Eli Uriegas <[email protected]>
I think the keyword here is "most". The binary tests will indeed capture dependency issues related to conflicts. Unfortunately they won't capture changes in the behaviour such as the ones we observe often in vision. Perhaps the solution for this is to restructure the tests and run in different os/py/cu configurations only those that can get affected. This will reduce the cost without reducing the quality of the checks that we run. We are happy to consider adding this as a project on our roadmap once the standardization of the infra is done across domains. Doing this prematurely before fixing our CI and infra code will put us to unnecessary risks. |
Hey @datumbox,
As I understand this PR the tests will still be run on Also, the standardization is not going to happen until H2 2022 at the earliest. There's a lot of wasted resources between now and then if this PR isn't merged. |
@bigfootjon Thanks for the input. As you can see at #5574 (merged less than an hour ago) we are actively working to improve the performance of our CI and reduce the duration of our tests. I don't know how fair the comparison between TorchVision and PyTorch core is, given that core has moved a large number of its tests on Github Actions and is still FBcode-first. TorchVision is Github-first and uses almost exclusively CircleCI. I'm a bit surprised that the value of the matrix test is questioned given that @seemethere's team has helped us time and time again to resolve weird bugs that popup on a single platform/CUDA combination. The binary jobs don't often catch all the issues caused by dependencies, hence running these tests allow us to catch issues in time. Concerning the delay on the standardization work, my understanding was that you were looking into doing this in 2022H1. It's a pity if it's delayed by so much as indeed our combined efforts can help reduce the costs. At any case, as part of TorchVision's 2022H1 roadmap, we plan to improve our model testing infrastructure and this will allow us to reduce the costs. TorchVision has over 130 pretrained model variants (some of which are pretty big and thus require beefy machines), which contributes to increased durations. After this work is concluded, we will be able to selectively run across all matrices only those tests that can get affected by the configuration which will hopefully bring down the cost. |
Limits scope of unittesting to one python version for both macOS and
Windows. These types of workflows are particularly expensive and take a
long time so running them on every PR / every push is a bit wasteful
considering the value in signal between different python versions is
probably negligible.
Similar to pytorch/audio#2256
Context: pytorch/audio#2254
Signed-off-by: Eli Uriegas [email protected]