Skip to content

Remove workarounds in tests that are no longer flaky due to stable sort #4970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jdsgomes opened this issue Nov 22, 2021 · 1 comment
Closed

Comments

@jdsgomes
Copy link
Contributor

In the #4767 PR we enabled stable sort in the NMS op kernel. This can potentially resolve some of the flakiness seen in the tests can be resolved, for instance:

vision/test/test_models.py

Lines 658 to 672 in f093d08

# Unfortunately detection models are flaky due to the unstable sort
# in NMS. If matching across all outputs fails, use the same approach
# as in NMSTester.test_nms_cuda to see if this is caused by duplicate
# scores.
expected_file = _get_expected_file(model_name)
expected = torch.load(expected_file)
torch.testing.assert_close(
output[0]["scores"], expected[0]["scores"], rtol=prec, atol=prec, check_device=False, check_dtype=False
)
# Note: Fmassa proposed turning off NMS by adapting the threshold
# and then using the Hungarian algorithm as in DETR to find the
# best match between output and expected boxes and eliminate some
# of the flakiness. Worth exploring.
return False # Partial validation performed

We should investigate if this is the case, and potentially adjust the the expected values of the tests if required.

cc @NicolasHug @pmeier @datumbox

@datumbox
Copy link
Contributor

datumbox commented Mar 9, 2022

Unfortunately switching to stable sort is not enough to defeat the flakiness of the tests. The flakiness will be resolved by avoiding initializing the models from random weights and using noise as input. These will be addressed later in the half.

My investigation at #5573 (comment) shows that we can't remove the workarounds at the moment, so I'll close the issue as nofix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants