-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Make test precision stricter for Classification #6380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@@ -841,7 +841,7 @@ def test_video_model(model_fn, dev): | |||
# RNG always on CPU, to ensure x in cuda tests is bitwise identical to x in cpu tests | |||
x = torch.rand(input_shape).to(device=dev) | |||
out = model(x) | |||
_assert_expected(out.cpu(), model_name, prec=0.1) | |||
_assert_expected(out.cpu(), model_name, prec=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @datumbox I am suspecting that this chang might cause flakyness in our internal tests
D38824237 shows that pytorch_vision_gpu-buck_2 is failing: https://www.internalfb.com/intern/testinfra/diagnostics/844425187765605.562950021904293.1660819094/
I don't see that on the other diffs (yet?) so I assume it's flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up. I would recommend turning off the internal test and relying on Github for verifying the models. Historically the test_models.py
at Meta has been a source of flakyness, so feel free to turn off anything that breaks. Let me know if you need any help from me.
Edit: Hmm actually the very specific model is internal only and I wonder if its expected file is produced properly. The previous precision value was so high that was doing nothing effectively. looping @jdsgomes in case he wants to have a look on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into it, could be that the expected files are wrong.
Summary: * Make test precision stricter for Classification * Update classification threshold. * Update quantized classification threshold. Reviewed By: datumbox Differential Revision: D38824223 fbshipit-source-id: aa5adbf9fa7d55c0343c97cbe162c40a7ca0f984
Currently we are using extremely low precision thresholds on some of our classification tests (image, quantized and video). Especially on video, the value is so high that it doesn't capture breaking changes on the code. This PR tries to update the values to stricter thesholds without causing flakiness.