-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Undocumented Feature NUM_PARALLEL #1736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This relatively recent addition (from PR #1536) is sort-of documented in Makefile.rule (where most of the build-time definitions for the gmake build system live). |
@lightsighter i wonder why you prefer it to keep option "undocumented" in your opinion and post WIKI page into a bug tracker. |
I think it would still be good to document it somewhere in the readme just like is done with USE_OPENMP and DEBUG options. @brada4 I don't want it to be undocumented; I would prefer it to be explicitly documented. I just didn't know where else to put this so that other users could search for this and find it. |
makefile.rule describes it, you can makw a faq page with more explanations? |
Finally added this to the user documentation in the wiki |
Where in the wiki can I find the change? |
This comment was marked as off-topic.
This comment was marked as off-topic.
The "Build system overview" page for now, the whole thing needs to be restructured and I've basically only paraphrased your original posting for now. Currently looking into if/how to get this kludge really working with multiple concurrent instances given that there's some state information buried in the queue buffer structure as well that we don't want to end up in the wrong task. |
To be clear, that documentation doesn't actually cover the scenario described in this issue (somebody renamed this issue incorrectly). The issue being discussed here is NOT multiple threads calling into OpenBLAS at the same time. The issue raised here is the fact that OpenBLAS does not currently support using multiple copies of the OpenMP runtime in the same process. Imagine I have two threads, each of which are bound to a separate copy of the OpenMP runtime (and each OpenMP runtime has its own thread pool). If each of these threads call into OpenBLAS and want OpenBLAS to dispatch to their associated OpenMP runtime, then that does not work because of global variables that exist inside of OpenBLAS's implementation which makes the (invalid) assumption that there is never more than one OpenMP runtime in the same process. |
I'm not sure this issue got renamed, there is another open issue (by you as well, IIRC) that covers the somewhat unusual "multiple separate copies of OpenMP" topic. I have some vague hopes of addressing that scenario with what I'm currently working on as well, but it could all fall apart still. |
Ah, maybe I'm getting it confused then. I should have read the rest of the history. Sorry for the confusion.
Thanks! Looking forward to it! Good luck! |
the other is #2164 |
This is more a comment about an undocumented feature in case other users encounter a similar problem. I have an implementation of the OpenMP runtime that supports multiple copies of the OpenMP runtime in the same process. I was having threads bound to different OpenMP runtimes call into OpenBLAS simultaneously, but their executions were being serialized by OpenBLAS which was causing bad performance. The relevant bit of code is here:
https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L321-L336
Effectively there is a fixed number of buffers for managing parallel OpenMP calls available and the default is 1. So if multiple OpenMP runtimes call into OpenBLAS at the same time then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. It seems like the right way to fix this is to set NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process.
https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L197-L199
This will then set the max parallel number:
https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L1015
and then that will fill in extra buffers for OpenMP usage:
https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L57-L62
As far as I can tell this isn't documented anywhere. Maybe I just missed it. Please feel free to point me at the proper documentation if I did overlook it.
The text was updated successfully, but these errors were encountered: