-
Notifications
You must be signed in to change notification settings - Fork 1.6k
locking performance with OpenBLAS #2247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for these suggestions. I certainly do not consider myself an expert in thread-safe programming, most if not all of these locks were added in response to valgrind/helgrind warnings and with the intention to keep code changes minimal. |
2nd part looks more transactional the way that all RO checks are done in locked context followed by RW, currently it looks like [].used can be raced to change between 2 checks.... The initialisation could happen racing from 2 threads in case static library did not initialize it and 2 threads got called. Could you make a PR on the 2nd part? |
What version you are working on? |
Indeed the code section that your item (2) apparently refers to is already commented out since 0.3.4 (brada4's #1814, the "WHEREAMI" code on which it depended was deemed obsolete). |
The idea is sort of good - code pinches the variable without locks , then locks if it jas chance and updates, just that lock syscall is more expensive (and gets worse with spectre etc fixes) than plainly iterating all structure in a loop under one lock. |
Sorry about that - my colleague was working with 0.3.3. I appreciate the feedback. |
Greetings OpenBLAS developers:
One of my colleagues has been working on an application that uses OpenBLAS, and he noticed some performance issues involving locking on OpenBLAS. He made some changes to OpenBLAS that seemed to improve performance. He passed these along to me, and I wanted to mention them here for your review. I would like to get your thoughts on these changes - particularly if you think they are 100% safe. We have not encountered any issues with these changes so far, but our testing has not been exhaustive.
Here is what my colleague had to say:
In the blas_memory_alloc() routine, change this:
To this:
And then also move the corresponding UNLOCK_COMMAND(&alloc_lock) to inside the code block at the end:
In the blas_memory_alloc() routine, change this:
To this:
In essence, he has observed that fine-grained locking and unlocking of individual elements in the memory[] array while scanning them is actually worse than just locking and unlocking the entire array once.
Let us know what you think about these changes. Thanks in advance!
The text was updated successfully, but these errors were encountered: