-
Notifications
You must be signed in to change notification settings - Fork 3.4k
CHORE: Explicit random seed for tests #3048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3048 +/- ##
=======================================
Coverage 54.92% 54.92%
=======================================
Files 90 90
Lines 12862 12862
=======================================
Hits 7064 7064
Misses 5798 5798 see 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks @connortann for the PR. I think setting the seeds is a good idea in general. I do have one concern (which I raised in #2960 as well), which is that the nature of the failures that we are encountering is not of the Right now, it's not entirely clear to me why this is happening, so it's particularly concerning because it could mean there are some edge cases where our algorithms are not considering => thus resulting in additivity failures. (I'm leaning towards this conclusion because there are many open issues here talking about this very problem.) In some ways, it's a good thing that we're getting the occasional errors => it's actually giving us examples upon which we can debug. |
I completely agree. So, I suggest we don't close issue #2960 , but keep it open until we can diagnose the root cause. However, in the meantime it's probably preferable for this issue not to affect other unrelated PRs. Perhaps setting the random seed will also be helpful for diagnosing the issue: we could determine a value of the random seed that reliably causes the tests to fail. |
But the only way for us to produce these failures at the moment is via the tests running in CI (where we don't fix the random seed), right? If we fix the random seed now, then it would make it harder for us to identify which tests we need to focus our attention on (to fix the additivity issues)
The impact isn't that bad, we just need to re-run the CI. It's annoying, but I see the aforementioned problem as a bigger problem than the annoyance from having to re-run CI. |
I don't think that's quite right: if you wanted to work on the additivity issue, you could create a PR that sets the random seed to a number that does reliably cause a failure. That's an improvement on the current state, as you will be able to determine when the issue is actually fixed. |
That's not quite what I meant. Let me rephrase: are we able to confidently list down all of the existing tests that fail additivity checks for particular random seeds, and also the random seed that generates the failure? If we can't, I don't think #2960 should be closed. So far, I've listed one test (the xgboost test) with one random seed in the original issue to reproduce the problem. My point is that we don't have full answers to the above question. And the only way to get the answer is to leave the tests running as they are in CI. (how else would we know which tests are "flaky"?) Unless we introduce the hypothesis library into our testing. |
I see what you mean, I'm with you. I'll rejig this PR with a slightly different aim then, to ensure that each test with randomness accepts a random seed, and the the seed is printed to the pytest logs if the test fails. |
I had a go implementing the above. Hopefully that is the best of both worlds: a different seed will be used in each run by default, but it will be easy to fix the seed to reproduce a given failure. My suggestion is to use the new def test_foobar(random_seed):
assert False If the test fails, the seed will be printed by pytest:
|
Found a flaky failure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: This approach I can get on board with :)
There are a few more test files that I think we should also cover in this PR. (I just did a grep for np.random.seed
in the project)
tests/explainers/test_linear.py
tests/explainers/test_tree.py
- and a couple more tests in
tests/explainers/test_kernel.py
(liketest_linear()
in that file)
It failed again with |
I noticed in However, it's probably wise to set the global seed explicitly for reproducibility, as the implicit default expectation is that unit tests are deterministic and reproducible. I added a Then, for tests that explicitly wish to use fuzzing, the |
I made a few further updates from having examined some failures:
I've re-run the tests a few times, I'll keep noting any flaky issues in #2960. |
FYI I've re-run the suite a few times to try to identify other flaky failures & seeds. Things seem to be passing consistently now, run 4 sets (of 8 parallel runs) without a failure 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improves the use of randomness in the test suite for reproducibility, and to mitigate the occurence of flaky tests.
Key changes are in
conftest.py
:RandomState
in each test rather than the global random state.Should help address #2960 , improving the reproducibility of any test failures.