-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
ci: Add test retry logic for flaky tests #9218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for opening this pull request! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## alpha #9218 +/- ##
=======================================
Coverage 93.76% 93.76%
=======================================
Files 184 184
Lines 14715 14715
=======================================
Hits 13797 13797
Misses 918 918 ☔ View full report in Codecov by Sentry. |
This reverts commit f221852.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how this works?
Override the Jasmine Spec class, replace the original test function with retrying the original test function |
How are flaky tests identified, so they get fixed at some point? What's the purpose of What's the implication of "Identify test by name"? What happens if 2 tests have the same name? |
From this point forward any test that randomly fails for no reason is a flaky test. If you find such a test add it to flakyTest.json
If any test with that name fails will get retried. The only implication is if the name gets changed. If the name gets changed hopefully they are fixing the flaky test. I could set it up to auto retry any failed test. The only problem with that is local development but I can set it up to retry on CI only. @mtrezza You are the main reviewer so whats easier for you? Do you currently keep track of randomly failing tests or just rerun everytime? |
If we undertake a targeted effort to fix flaky tests we track them via an issue, like we did in #7180. There is currently no such open issue where we actively track them, but I don't think that's necessary as we have significantly less flaky tests today than we had back then - also thanks to your efforts. Why don't we use test IDs to identify flaky tests? If we see a test is flaky, we assign an ID via |
I tried that already. I can't get the id as it happens before the test spec is ran and there is no way to pass it through. |
For example, in https://github.com/parse-community/parse-server/actions/runs/9983595053/job/27591447103 I would identify this flaky test:
So I'll assign ID And I would add that ID to the flaky list.json, right? -- Edit, just read your previous comment; that's too bad. Maybe we can modify the |
You would add |
It's idempotency where we simulate a TTL index:
Haven't looked at this for a while but I vaguely remember that I've set it to that value on purpose; it has something to do with the comment above. |
At what point should we turn on the test randomizer again? |
Jasmine 5.0.0 released Running Spec in Parallel that would require randomizer to be turned on. To upgrade Jasmine we would have to give the test suite some TLC as we are mixing async and done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
But we had the randomizer turned off for flakiness investigation, right? So if we now fixed a lot of these flaky tests, plus we have this retry logic, shouldn't we turn it back on? |
Yeah we can turn it back on in a separate PR. |
Isn't turning on the randomizer part of this retry strategy? It would also be interesting to see how this tool behaves with the randomizer turned on. Before we merge this we should find a process for how to use this tool, what's the criteria for adding a test to the list and how to deal with flaky tests once they are obscured by this tool. For example, there is flakiness that isn't related to a specific test but is a result of previous tests. So we may end up adding more and more tests to the flaky list without really solving anything. |
Is there any way of knowing, how many retries are required for the flaky tests to pass? Some stats would be interesting to know how efficient the approach is. |
@mtrezza It now outputs how many times a flaky test retried to pass |
Looks good to me. As I mentioned earlier, I'm still somewhat unsure how this tool is intended to be used. A flaky test may well indicate a bug and not just be a test issue. So we'd need to define when to add a test to the list. Adding a test to the list for retrying only makes sense if we work on fixing them once they are added, otherwise we are effectively weakening our CI quality. The annoyance of a failing CI is what drove fixing flaky tests in the past. This tool creates the convenience of hiding the flakiness, but that may undermine the drive to fix it. So roughly, the rules could be:
|
@dplewis how should we treat this test issue? It seems that after shutdown LiveQuery the subsequent tests all failed. This looks like an issue with the test logic, so we wouldn't add anything to the flaky list, right? |
🎉 This change has been released in version 7.3.0-alpha.7 |
🎉 This change has been released in version 7.3.0-beta.1 |
🎉 This change has been released in version 7.3.0 |
Pull Request
Issue
Flaky tests require a lot more effort to merge PRs. Test would have to be re-ran until the flaky tests passes or they could be ignored if you know which tests are flaky. Ideally every flaky test should be fixed as they are found which hasn't been the case.
Closes: #8654
Approach
this
from test suiteRegexVulnerabilities.spec