Skip to content

Undocumented Feature NUM_PARALLEL #1736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lightsighter opened this issue Aug 13, 2018 · 12 comments
Closed

Undocumented Feature NUM_PARALLEL #1736

lightsighter opened this issue Aug 13, 2018 · 12 comments

Comments

@lightsighter
Copy link

This is more a comment about an undocumented feature in case other users encounter a similar problem. I have an implementation of the OpenMP runtime that supports multiple copies of the OpenMP runtime in the same process. I was having threads bound to different OpenMP runtimes call into OpenBLAS simultaneously, but their executions were being serialized by OpenBLAS which was causing bad performance. The relevant bit of code is here:

https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L321-L336

Effectively there is a fixed number of buffers for managing parallel OpenMP calls available and the default is 1. So if multiple OpenMP runtimes call into OpenBLAS at the same time then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. It seems like the right way to fix this is to set NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process.

https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L197-L199

This will then set the max parallel number:

https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L1015

and then that will fill in extra buffers for OpenMP usage:

https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L57-L62

As far as I can tell this isn't documented anywhere. Maybe I just missed it. Please feel free to point me at the proper documentation if I did overlook it.

@martin-frbg
Copy link
Collaborator

This relatively recent addition (from PR #1536) is sort-of documented in Makefile.rule (where most of the build-time definitions for the gmake build system live).

@brada4
Copy link
Contributor

brada4 commented Aug 14, 2018

@lightsighter i wonder why you prefer it to keep option "undocumented" in your opinion and post WIKI page into a bug tracker.

@lightsighter
Copy link
Author

I think it would still be good to document it somewhere in the readme just like is done with USE_OPENMP and DEBUG options.

@brada4 I don't want it to be undocumented; I would prefer it to be explicitly documented. I just didn't know where else to put this so that other users could search for this and find it.

@brada4
Copy link
Contributor

brada4 commented Aug 15, 2018

makefile.rule describes it, you can makw a faq page with more explanations?

@martin-frbg
Copy link
Collaborator

Finally added this to the user documentation in the wiki

@lightsighter
Copy link
Author

Where in the wiki can I find the change?

@brada4

This comment was marked as off-topic.

@martin-frbg
Copy link
Collaborator

The "Build system overview" page for now, the whole thing needs to be restructured and I've basically only paraphrased your original posting for now. Currently looking into if/how to get this kludge really working with multiple concurrent instances given that there's some state information buried in the queue buffer structure as well that we don't want to end up in the wrong task.

@lightsighter
Copy link
Author

To be clear, that documentation doesn't actually cover the scenario described in this issue (somebody renamed this issue incorrectly). The issue being discussed here is NOT multiple threads calling into OpenBLAS at the same time. The issue raised here is the fact that OpenBLAS does not currently support using multiple copies of the OpenMP runtime in the same process. Imagine I have two threads, each of which are bound to a separate copy of the OpenMP runtime (and each OpenMP runtime has its own thread pool). If each of these threads call into OpenBLAS and want OpenBLAS to dispatch to their associated OpenMP runtime, then that does not work because of global variables that exist inside of OpenBLAS's implementation which makes the (invalid) assumption that there is never more than one OpenMP runtime in the same process.

@martin-frbg
Copy link
Collaborator

I'm not sure this issue got renamed, there is another open issue (by you as well, IIRC) that covers the somewhat unusual "multiple separate copies of OpenMP" topic. I have some vague hopes of addressing that scenario with what I'm currently working on as well, but it could all fall apart still.

@lightsighter
Copy link
Author

Ah, maybe I'm getting it confused then. I should have read the rest of the history. Sorry for the confusion.

I have some vague hopes of addressing that scenario with what I'm currently working on as well, but it could all fall apart still.

Thanks! Looking forward to it! Good luck!

@martin-frbg
Copy link
Collaborator

the other is #2164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants