Description
I have a project where many parts of it use openmp or std::thread for parallelization, and it works fine on a 12-core linux server.
Now I use openblas in one of the modules in the project, the default make
build me a single threaded version(by run make
without USE_OPENMP=1
), when it is linked to the project, the whole system runs like in single thread even if the module that calls openblas functions is not actually called in the runtime.
It seems the system can only see one cpu core, and the number of thread that created while running is far more than 1. so the speed drops dramatically.
I fix this by rebuild openblas library with make USE_OPENMP=1
, but I think this flag should only affects the behavior of openblas, not the system that calls it.
My question is:
- why USE_OPENMP=0 affects the whole system that link with openblas library, how does that happen?
- how to make other parts of my system runs in parallel even if I link a single thread version openblas?