Skip to content

USE_OPENMP=0 makes the whole system runs on only one cpu core. #1591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
liuyuuan opened this issue Jun 4, 2018 · 5 comments
Closed

USE_OPENMP=0 makes the whole system runs on only one cpu core. #1591

liuyuuan opened this issue Jun 4, 2018 · 5 comments

Comments

@liuyuuan
Copy link

liuyuuan commented Jun 4, 2018

I have a project where many parts of it use openmp or std::thread for parallelization, and it works fine on a 12-core linux server.

Now I use openblas in one of the modules in the project, the default make build me a single threaded version(by run make without USE_OPENMP=1), when it is linked to the project, the whole system runs like in single thread even if the module that calls openblas functions is not actually called in the runtime.

It seems the system can only see one cpu core, and the number of thread that created while running is far more than 1. so the speed drops dramatically.

I fix this by rebuild openblas library with make USE_OPENMP=1, but I think this flag should only affects the behavior of openblas, not the system that calls it.

My question is:

  1. why USE_OPENMP=0 affects the whole system that link with openblas library, how does that happen?
  2. how to make other parts of my system runs in parallel even if I link a single thread version openblas?
@liuyuuan liuyuuan changed the title USE_OPENMP=0 makes the whole system single threaded. USE_OPENMP=0 makes the whole system runs on only one cpu core. Jun 4, 2018
@brada4
Copy link
Contributor

brada4 commented Jun 4, 2018

When you type "make" it uses pthreads on linux.
Can you get build output (script ; make ; ^D)

  1. Do you have some core affinity mangling library linked in, like cuda driver, cgroups-based balancers etc.
    Also OpenBLAS should be built with NO_AFFINITY=1 for OPENMP, as the later has own affinity settings.
  2. you mentioned, OMP, std::thread, you can run nproc independent processes etc. See above for almost mandatory NO_AFFINITY build setting.

@brada4
Copy link
Contributor

brada4 commented Jun 5, 2018

Actually already spotted before - USE_XYZ=0 has different effect than not defining it at all.
#1422 (comment)

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 7, 2018

The behaviour of USE_OPENMP=0 (rather than leaving that variable unset) may indeed need checking (Makefile.system does force it to "0" when NUM_THREADS is set to 1, but several parts of the code only check if USE_OPENMP is defined at all).
Still, a default make should build a multithreaded version (with the number of threads depending on the number of cpu cores detected in the build host), provided that libpthread is available.
Which version of OpenBLAS are you using, by the way ? For several months since last summer, the default setting of NO_AFFINITY=1 in Makefile.rule had been inadvertently commented out, which could cause the OpenBLAS inititalization to bind everything spawned after it to the same cpu. This unfortunately affected the 0.2.20 release and subsequent develop snapshots until march 19 of this year. If you build OpenBLAS yourself, it is trivial to change this setting, but updating to 0.3.0 or current develop is still advisable if you are really using an earlier version.

@liuyuuan
Copy link
Author

liuyuuan commented Jun 8, 2018

@martin-frbg I checked the version I've built, it's v0.2.20, and NO_AFFINITY=1 is indeed commented out in MakeFile.rule file, the behavior of my system is just like you described: it runs all threads on the same cpu. thank you!

@martin-frbg
Copy link
Collaborator

Problem solved, or should this issue be kept open ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants