-
-
Notifications
You must be signed in to change notification settings - Fork 674
Implement # sage.doctest: flaky marker #39539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Documentation preview for this PR (built with commit e9b1b35; changes) is ready! 🎉 |
@@ -1,3 +1,4 @@ | |||
# sage.doctest: flaky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the log it looks like this was correctly retried once. But it still timeout…? (what's the probability this fails twice?)
sagemathgh-39664: Add some 'not tested' marks to avoid CI failure As in the title. I don't think there's any advantage in running the test again. There's only a very small risk of the fixer forget to delete the marker, but it seems like a nonexistent issue (whichever pull request that fix it should also remove the `# not tested`) At least for those that doesn't segmentation fault or hang. (For those who do the only solution I can think of is sagemath#39539 ) Side note: not sure what's a good solution to this. Maybe we can do sagemath#39470 instead? (but then it doesn't apply to meson…) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#39664 Reported by: user202729 Reviewer(s):
sagemathgh-40814: Rerun plural and singular/function on failure This pull request: * add new feature `--all-except` to `sage -t` (does what you expect) * modify `ci-meson.yml` to workaround sagemath#29528 , the cause of which is yet unknown. (I also tried porting an old pull request that purportedly fix the issue at sagemath#39628, but the result is even worse.) controlling this in bash seems easier than sagemath#39539 , for now. I suspect testing these files separately will make it stop failing however (doesn't really matter, the bug remains). sagemath#40729 (comment) contains a traceback, but I think it isn't of too much help. (Thought? Is `--all --exclude=a --exclude=b` better?) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [ ] The title is concise and informative. - [ ] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#40814 Reported by: user202729 Reviewer(s): Tobias Diez
sagemathgh-40814: Rerun plural and singular/function on failure This pull request: * add new feature `--all-except` to `sage -t` (does what you expect) * modify `ci-meson.yml` to workaround sagemath#29528 , the cause of which is yet unknown. (I also tried porting an old pull request that purportedly fix the issue at sagemath#39628, but the result is even worse.) controlling this in bash seems easier than sagemath#39539 , for now. I suspect testing these files separately will make it stop failing however (doesn't really matter, the bug remains). sagemath#40729 (comment) contains a traceback, but I think it isn't of too much help. (Thought? Is `--all --exclude=a --exclude=b` better?) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [ ] The title is concise and informative. - [ ] The description explains in detail what this PR is about. - [ ] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#40814 Reported by: user202729 Reviewer(s): Tobias Diez
Now that #40814 is merged, what's the plan to go forward here? |
I don't know, depends on if tests fail. (like, I hope tests don't fail but…?) on the other hand the other solution won't work on someone who is not running the CI. |
There are enough doctests that randomly fail and it would be nice to have a solution for this. It's annoying locally as well, but the main headache is CI. We could add essentially every file that has known flaky tests to the list in #40814. But I got the impression you would like to keep that list reserved for tests that fail with segfaults/timeouts which are hard to catch in python/doctest module. So your plan is to use this PR here for "normal" flaky tests? |
I mean, normal flaky test can also be tested with say
ugly but works. (or migrate to pytest, where you probably gain some pytest marker thing (https://pytest-rerunfailures.readthedocs.io/stable/mark.html?), but lose preparsing and need to explicitly import, and make the test far away from the code…) |
The intention is to avoid the annoying failing doctests.
Background on timed out tests
Basically one of the problems with occasional timeout is the following: Sometimes malloc need to hold a lock while doing something. If a signal comes while it is holding the lock, the next time malloc is called it will try to acquire the lock again and deadlock there.
A workaround is to unlock the malloc lock inside the signal handler — but how do you know which lock it is?
I can't reproduce this on my machine (in fact on my machine setting a gdb breakpoint in
__lll_lock_wait_private
doesn't even hit it during the computation), so I can't figure out a way to fix it.Workaround
Files starting with
# sage.doctest: flaky
will be ran once more time if they timeout. Same for segmentation fault. E.g.plural.pyx
→ #39098📝 Checklist
⌛ Dependencies