-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Multithread complex dot product #2221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The dot functions are a bit special as they are not multithreaded at the interface level like most everything else - if I remember correctly, the opinion of the earlier developers was that these were bound by the system I/O bandwidth limit already. A select few machines do have multithreaded ddot kernels - for x86_64, I stole the idea and implementation from the arm64 ThunderX kernels in #1491 to satisfy another Julia request. Not sure if zdot(c) would be just as easy... |
At least on my laptop that is not the case - I see a definite speedup in ddot. |
PR #2222 now but may need some tuning - for now it is just a copypasta of the ARM server cpu code including its n=10000 threshold. |
Can you give some numbers, like CPU trade name? How big is input? Is it cache-line aligned (like starts on fresh malloc)? Does it help to double processing block in source file? |
I'm confused by most of your post, but here are some numbers. This is with julia 1.1, on Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
using BenchmarkTools
for N in (5_000, 20_000, 100_000)
for i in (1,2,4)
BLAS.set_num_threads(i)
aa = randn(N)
bb = randn(N)
@btime dot($aa,$bb)
end
end
|
Thanks, so it is pretty standard desktop CPU. I was a bit afraid of those high-end with numa in the cartridge and many memory controllers. |
* Add multithreading support copied from the ThunderX2T99 kernel. For #2221
Multithreading (at the kernel level, as already done on ARM server cpus) was added in #2222 |
On my machine (linux x86_64),
zdotc
is not multithreaded.ddot
is, though.The text was updated successfully, but these errors were encountered: