Skip to content

Conversation

user202729
Copy link
Contributor

@user202729 user202729 commented Feb 17, 2025

The intention is to avoid the annoying failing doctests.

Background on timed out tests

Basically one of the problems with occasional timeout is the following: Sometimes malloc need to hold a lock while doing something. If a signal comes while it is holding the lock, the next time malloc is called it will try to acquire the lock again and deadlock there.

A workaround is to unlock the malloc lock inside the signal handler — but how do you know which lock it is?

I can't reproduce this on my machine (in fact on my machine setting a gdb breakpoint in __lll_lock_wait_private doesn't even hit it during the computation), so I can't figure out a way to fix it.

Workaround

Files starting with # sage.doctest: flaky will be ran once more time if they timeout. Same for segmentation fault. E.g. plural.pyx#39098

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Copy link

github-actions bot commented Feb 17, 2025

Documentation preview for this PR (built with commit e9b1b35; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@@ -1,3 +1,4 @@
# sage.doctest: flaky
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the log it looks like this was correctly retried once. But it still timeout…? (what's the probability this fails twice?)

@user202729 user202729 mentioned this pull request Feb 23, 2025
5 tasks
@user202729 user202729 marked this pull request as draft March 18, 2025 05:58
@user202729 user202729 assigned user202729 and unassigned user202729 Mar 18, 2025
vbraun pushed a commit to vbraun/sage that referenced this pull request Mar 19, 2025
sagemathgh-39664: Add some 'not tested' marks to avoid CI failure
    
As in the title. I don't think there's any advantage in running the test
again.

There's only a very small risk of the fixer forget to delete the marker,
but it seems like a nonexistent issue (whichever pull request that fix
it should also remove the `# not tested`)

At least for those that doesn't segmentation fault or hang. (For those
who do the only solution I can think of is
sagemath#39539 )

Side note: not sure what's a good solution to this. Maybe we can do
sagemath#39470 instead? (but then it
doesn't apply to meson…)

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#39664
Reported by: user202729
Reviewer(s):
vbraun pushed a commit to vbraun/sage that referenced this pull request Sep 24, 2025
sagemathgh-40814: Rerun plural and singular/function on failure
    
This pull request:

* add new feature `--all-except` to `sage -t` (does what you expect)
* modify `ci-meson.yml` to workaround
sagemath#29528 , the cause of which is
yet unknown. (I also tried porting an old pull request that purportedly
fix the issue at sagemath#39628, but the
result is even worse.)

controlling this in bash seems easier than
sagemath#39539 , for now. I suspect testing
these files separately will make it stop failing however (doesn't really
matter, the bug remains).

sagemath#40729 (comment)
contains a traceback, but I think it isn't of too much help.

(Thought? Is `--all --exclude=a --exclude=b` better?)

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40814
Reported by: user202729
Reviewer(s): Tobias Diez
vbraun pushed a commit to vbraun/sage that referenced this pull request Sep 27, 2025
sagemathgh-40814: Rerun plural and singular/function on failure
    
This pull request:

* add new feature `--all-except` to `sage -t` (does what you expect)
* modify `ci-meson.yml` to workaround
sagemath#29528 , the cause of which is
yet unknown. (I also tried porting an old pull request that purportedly
fix the issue at sagemath#39628, but the
result is even worse.)

controlling this in bash seems easier than
sagemath#39539 , for now. I suspect testing
these files separately will make it stop failing however (doesn't really
matter, the bug remains).

sagemath#40729 (comment)
contains a traceback, but I think it isn't of too much help.

(Thought? Is `--all --exclude=a --exclude=b` better?)

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40814
Reported by: user202729
Reviewer(s): Tobias Diez
@tobiasdiez
Copy link
Contributor

Now that #40814 is merged, what's the plan to go forward here?

@user202729
Copy link
Contributor Author

I don't know, depends on if tests fail. (like, I hope tests don't fail but…?)

on the other hand the other solution won't work on someone who is not running the CI.

@tobiasdiez
Copy link
Contributor

There are enough doctests that randomly fail and it would be nice to have a solution for this. It's annoying locally as well, but the main headache is CI.

We could add essentially every file that has known flaky tests to the list in #40814. But I got the impression you would like to keep that list reserved for tests that fail with segfaults/timeouts which are hard to catch in python/doctest module. So your plan is to use this PR here for "normal" flaky tests?

@user202729
Copy link
Contributor Author

user202729 commented Sep 30, 2025

I mean, normal flaky test can also be tested with say

sage: for i in range(5):
....:     whatever

ugly but works. (or migrate to pytest, where you probably gain some pytest marker thing (https://pytest-rerunfailures.readthedocs.io/stable/mark.html?), but lose preparsing and need to explicitly import, and make the test far away from the code…)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants