Skip to content

Prevent tests from leaking their respective RNG #4497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 29, 2021

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Sep 28, 2021

This PR adds a new autouse pytest fixture to prevent each test from leaking its RNG consumption to other tests.

We've had strange errors and major headaches because of this in the past #3032 (comment)

After a bit of thoughts and discussions with @pmeier and @vfdev-5, I don't think there's a better alternative than relying on an autouse fixture. It's a bit implicit, but that's the best we can do considering that pytorch only offers manual_seed() to contrrol the RNG, which leaks to the entire program execution.

cc @pmeier

@@ -714,6 +714,7 @@ def test_random_apply(device):
@pytest.mark.parametrize('channels', [1, 3])
def test_gaussian_blur(device, channels, meth_kwargs):
tol = 1.0 + 1e-10
torch.manual_seed(12)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this to prevent the test from failing (see https://app.circleci.com/pipelines/github/pytorch/vision/10830/workflows/946973f0-bad1-48ed-aaf3-f3720ab5f56a/jobs/828462).

If anything, this shows that the PR is working as expected and that this test is a bit flaky, and sensible to the _create_data() call.

To confirm I parametrized it over 100 random seeds, and I got 8 failures over 1000+ test instances. Each time, just 1 pixel was off. Considering the low failure rate, I think it's fine to keep the manual seeding here.

Comment on lines +87 to +88
@pytest.fixture(autouse=True)
def prevent_leaking_rng():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much the same as freeze_rng_state() https://github.com/pytorch/vision/blob/main/test/common_utils.py#L86

autouse=True means that this fixture will be used by every single test automatically.

@NicolasHug NicolasHug changed the title NOMRG Add autouse fixture to save and reset RNG in tests Add autouse fixture to save and reset RNG in tests Sep 29, 2021
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! LGTM

@NicolasHug NicolasHug changed the title Add autouse fixture to save and reset RNG in tests Prevent tests from leaking their respective RNG Sep 29, 2021
@pmeier
Copy link
Collaborator

pmeier commented Sep 29, 2021

Nicolas and I had an offline discussion about this a while back and I wanted to also give my opinion here:

IMO, this is not needed. There should be only two types of tests for this purpose

  1. The test uses random data.
  2. The test uses fixed data, that we generate by setting the random seed and afterwards generate "random" data.

In case 1. we don't need a fixed RNG state, because it shouldn't matter. If it does, the test actually belongs to category 2. and should be fixed.

In case 2. the RNG state doesn't matter since we overwrite the seed anyway.

@NicolasHug Is there an actual issue that this will fix?

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

PyTorch does provide a generator that we can pass to (some) functions, but last time I checked it didn't work for CUDA (but that was a long time ago). If we had a better support for passing the generator in PyTorch (as we do in numpy), we could use it, but that's not the case yet IIRC.

Your solution seems ok with me, thanks!

@fmassa fmassa merged commit 13bd09d into pytorch:main Sep 29, 2021
@NicolasHug
Copy link
Member Author

NicolasHug commented Sep 29, 2021

@pmeier I think that #3032 (comment) and the very fact that test_gaussian_blur() broke clearly show that this is in fact needed.

IMO, this is not needed. There should be only two types of tests for this purpose

While you're right in principle, the reality is much less simple unfortunately. We have tests for which the RNG is not entirely controlled and in fact depends on other tests. test_gaussian_blur() was one of them. It's likely that there are others (and if they're flaky they'll pop up soon, which is a good thing).

@NicolasHug
Copy link
Member Author

Also, and possibly most importantly:

  1. The test uses random data.

Before this PR, we had no such tests: all tests were using a pre-defined RNG state that was leaked by the previous tests that were run before. There was almost zero test that was relying on the global pytorch RNG that would change across executions. The only tests that did were the tests that were run before the first test that called manual_seed().

@pmeier
Copy link
Collaborator

pmeier commented Sep 29, 2021

Maybe I misunderstood the purpose. From your comments I get that you want this to detect flaky tests that we don't realize are there, because they pass until the the order of execution changes and the get different "random" data. Is that correct?

@NicolasHug
Copy link
Member Author

detect flaky tests that we don't realize are there, because they pass until the the order of execution changes and the get different "random" data. Is that correct?

This is indeed a positive consequence of the changes, but that's not the main intent.

The main intent is that any test x should never be dependent on any other test y. I agree with you that in an ideal world we would never write such tests, but the reality is that we often do, whether we know it or not. The main reason being that it's not always obvious to figure out where randomness is involved.

facebook-github-bot pushed a commit that referenced this pull request Sep 30, 2021
Summary:
* Add autouse fixture to save and reset RNG in tests

* Add other RNG generators

* delete freeze_rng_state

* Hopefully fix GaussianBlur test

* Alternative fix, probably better

* revert changes to test_models

Reviewed By: datumbox

Differential Revision: D31270915

fbshipit-source-id: e60273bc90985aaac5e0aaa20cfbfdb86639b8b8

Co-authored-by: Francisco Massa <[email protected]>
cyyever pushed a commit to cyyever/vision that referenced this pull request Nov 16, 2021
* Add autouse fixture to save and reset RNG in tests

* Add other RNG generators

* delete freeze_rng_state

* Hopefully fix GaussianBlur test

* Alternative fix, probably better

* revert changes to test_models

Co-authored-by: Francisco Massa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants