-
-
Notifications
You must be signed in to change notification settings - Fork 216
numexpr is not thread-safe #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What if we put a global mutex around Ofcourse, this is non-optimal, especially if the thread holding the lock gets context switched out due to expiration of it's timeslice, but the 1 global mutex would be way better than current state (which just crashes if 2 threads call Thoughts? I can submit a PR, if people are OK with this solution |
I am +1 for what you are proposing. The best solution would be to create a context structure to host all the variables that are useful for a thread, create one of these per thread and make each thread to use its own structure. But I agree that your proposal is better than the current situation. |
yes, I was just about to work on it! I came to GH to clone the repo, and bam, saw the commit. Will check it out. |
I think I am getting this now. It is happening in one of my test cases, and only happens if I run that test case after a few others. I am not 100% sure if this is related to numexpr yet. Also, any advice on tracking this down would be helpful. I am thinking about building a docker container and compiling python from scratch using the debug flags, so I can run it under gdb perhaps. From command line:
In Pycharm:
|
Le 21/01/2016 10:21, jennolsen84 a écrit :
You can already run Python under gdb and get a full C backtrace. You can also use the faulthandler module on Python 3. |
vals is a 2D ndarray. Here is what the threads looked like:
top few frames of 23:
I am still looking, but wanted to share what I found so far |
Perhaps we should put the mutex on the python side... right here: Line 732 in a7937fd
This will also prevent races on |
That's what I did in the PR, and it seems to have fixed the crash |
FYI, still getting some more crashes, not 100% sure they're related. The callstack is in numexpr, and they didn't happen before... Strangely, these happen kind of randomly (and not on each run), even though same data is used every time. Here is a stack trace:
|
It's a FPU exception, and it seems to happen in (the division by zero can occur because of a race condition overwriting the shape's memory) |
(actually, that would be the iterator's internal copy of the shape) |
Unfortunately, I am unable to reproduce the crash I saw earlier today. I tried a few times. I will try to write a torture test to see if I can reproduce it. It would be calling numexpr 1000s of times, in a threadpool. Sometimes using the out= parameter, and sometimes allocating memory. If someone can think of a better test, please let me know. Otherwise, I think the PR should be accepted though, it does help things. |
I think the crash few comments above happened after I botched the install. Now, I am unable to reproduce the FPU exception. So, after adding the fat lock, things are good. |
From [email protected] on April 30, 2012 23:46:07
Calling numexpr.evaluate from two different threads at the same time causes a segfault, unless numexpr.set_num_threads(1) was called prior.
Looking at the code in vm_engine_iter_parallel, it looks to me like it uses a global variable to store thread information, and so chokes when two different threads call that function at the same time.
See attached file for a small code sample that reproduces this problem. And here's what I see when I actually run that script from the shell:
{{{
OK
Attachment: numexprCore.py
Original issue: http://code.google.com/p/numexpr/issues/detail?id=80
The text was updated successfully, but these errors were encountered: