Skip to content

Undocumented Feature NUM_PARALLEL #1736

Closed
@lightsighter

Description

@lightsighter

This is more a comment about an undocumented feature in case other users encounter a similar problem. I have an implementation of the OpenMP runtime that supports multiple copies of the OpenMP runtime in the same process. I was having threads bound to different OpenMP runtimes call into OpenBLAS simultaneously, but their executions were being serialized by OpenBLAS which was causing bad performance. The relevant bit of code is here:

https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L321-L336

Effectively there is a fixed number of buffers for managing parallel OpenMP calls available and the default is 1. So if multiple OpenMP runtimes call into OpenBLAS at the same time then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. It seems like the right way to fix this is to set NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process.

https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L197-L199

This will then set the max parallel number:

https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L1015

and then that will fill in extra buffers for OpenMP usage:

https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server_omp.c#L57-L62

As far as I can tell this isn't documented anywhere. Maybe I just missed it. Please feel free to point me at the proper documentation if I did overlook it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions