-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Segfault on dgemm_oncopy_HASWELL triggered by numpy.matmul inside a docker container (v0.3.13.dev) #3135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Java runtime being involved suggests to me that you might simply be exceeding the default stacksize imposed by it. Please try if setting _JAVA_OPTIONS="-Xss4096k" in the environment helps. |
Interesting! Thanks for the prompt reply! I'll give it a try and post what I find. |
As it turns out, it's not as straightforward for me to change the JRE setting as the docker container is set up and maintained by others (and then provided to users like me as a compute platform). Just for my understanding though - why should JRE settings (like stacksize) even be relevant in this case? To clarify, the code in question doesn't involve Java at all, and after I enabled faulthandler, this became what I saw
and JRE no longer showed up in the log. Maybe JRE is a red herring? I mean, it seems to me that the segfault might well have originated from Python, yet, with Python faulthandler disabled, JRE was just the first to "catch" and log it? |
Maybe, maybe not. Why does the JRE feature in your context at all, are you perhaps loading java-based libraries like libhdfs ? (#2821) ? |
Ah I get what you say. We do use fastparquet to load parquet files from HDFS! And as a temporary workaround, I've set I'll look into this a bit further and post updates. |
Can you confirm OPENBLAS_NUM_THREADS=1 do fix the problem 100% ? |
It seems to be working so far, although I haven't got the time to isolate the issue into a minimal reproducible case to rigorously test it. |
Hi, I've managed to reproduce this issue as shown in this gist. |
Computation takes little bit more than 16GB with threads. Docker thin-provisions memory while SGEMM will certainly use all of it. |
Good point, although I increased memory to 64 GB to no avail. Another thing I noticed - the SEGFAULT actually only happens about 80-90% of the time, and increasing the stack size seems to bring down the frequency by a little bit, albeit not entirely. |
Btw, when |
I cannot repeat with pure 0.3.13 at all besides clearly OOM with 16GB RAM. |
But in all my cases, NumPy is already at its latest version (i.e. v1.20.1) installed from scratch using pip. |
I cannot reproduce this problem (without docker/jre in the mix) so far. I do note however that the numpy-provided openblas binary gets built with a smaller GEMM buffersize than the usual default to limit memory requirements on big multi-core hardware - which may cause similar segfaults when the matrix size gets "too" big. |
This looks dated |
@brada4 That's SciPy, not NumPy. That said, it did look fishy to me at first and that's why I enabled faulthandler to capture exactly where the segfault originated, and as it turned out it was |
@martin-frbg Thanks that's interesting. Let me test the example without JRE or docker on my side and see what happens. Will update. |
Indeed I can only reproduce this issue when running it inside the docker container provided to me, which has things like JDK and Hadoop installed by default (and I cannot circumvent it). To add to the trickiness, the stack traceback from faulthandler isn't very helpful either, as it only shows things up until they leave Python. So I don't know if there's more info I can provide at this stage to help with the debugging. |
Fault handler might be right that crash is in "new" library, but there might be sgemm_ or cblas_sgemm from the other library. |
Hey, I'm still seeing segfaults when doing
numpy.matmul
with two big matrices (numpy v1.20.1, OpenBLAS v0.3.13.dev).This looks like potentially related to #2728 ?
The stacktrace points to a line where it does sth. like
np.matmul(samples, samples.T)
.I was running the code in a docker container (enterprise environment), where NumPy was installed using pip.

Here's the spec of the compute cluster, from which 6 CPUs were allocated to the container.
threadpool_info
via threadpoolctl shows the following, which confirms it was OpenBLAS v0.3.13.dev and thatnum_threads
was correctly recognized to be 6.Let me know if you need any other information!
The text was updated successfully, but these errors were encountered: