Multithread complex dot product #2221

antoine-levitt · 2019-08-14T20:42:51Z

On my machine (linux x86_64), zdotc is not multithreaded. ddot is, though.

The text was updated successfully, but these errors were encountered:

martin-frbg · 2019-08-14T21:21:42Z

The dot functions are a bit special as they are not multithreaded at the interface level like most everything else - if I remember correctly, the opinion of the earlier developers was that these were bound by the system I/O bandwidth limit already. A select few machines do have multithreaded ddot kernels - for x86_64, I stole the idea and implementation from the arm64 ThunderX kernels in #1491 to satisfy another Julia request. Not sure if zdot(c) would be just as easy...

antoine-levitt · 2019-08-15T08:24:43Z

if I remember correctly, the opinion of the earlier developers was that these were bound by the system I/O bandwidth limit already

At least on my laptop that is not the case - I see a definite speedup in ddot.

martin-frbg · 2019-08-15T08:30:06Z

PR #2222 now but may need some tuning - for now it is just a copypasta of the ARM server cpu code including its n=10000 threshold.

brada4 · 2019-08-15T13:08:48Z

Can you give some numbers, like CPU trade name? How big is input? Is it cache-line aligned (like starts on fresh malloc)? Does it help to double processing block in source file?
One cache line size, 4 values, should get fetched from memory to CPU, then 8 FLOPS done on them (that fits in 1-2 FMA instrs), then hopefully next batch is pre-fetched at once. It is certainly not FPU speed at fault, but something else micro-architectural on the way.

antoine-levitt · 2019-08-15T13:46:11Z

I'm confused by most of your post, but here are some numbers. This is with julia 1.1, on Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz

julia> BLAS.openblas_get_config()
"USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=16"

using BenchmarkTools

for N in (5_000, 20_000, 100_000)
    for i in (1,2,4)
        BLAS.set_num_threads(i)

        aa = randn(N)
        bb = randn(N)
        @btime dot($aa,$bb)
    end
end

  815.453 ns (0 allocations: 0 bytes)
  816.198 ns (0 allocations: 0 bytes)
  810.475 ns (0 allocations: 0 bytes)
  4.070 μs (0 allocations: 0 bytes)
  2.408 μs (0 allocations: 0 bytes)
  2.349 μs (0 allocations: 0 bytes)
  24.012 μs (0 allocations: 0 bytes)
  13.054 μs (0 allocations: 0 bytes)
  8.458 μs (0 allocations: 0 bytes)

brada4 · 2019-08-15T14:10:03Z

Thanks, so it is pretty standard desktop CPU. I was a bit afraid of those high-end with numa in the cartridge and many memory controllers.

* Add multithreading support copied from the ThunderX2T99 kernel. For #2221

martin-frbg · 2019-10-24T20:58:18Z

Multithreading (at the kernel level, as already done on ARM server cpus) was added in #2222

antoine-levitt mentioned this issue Aug 14, 2019

Real part of complex dot product JuliaLang/LinearAlgebra.jl#436

Closed

martin-frbg mentioned this issue Aug 15, 2019

Add multithreading support to the x86_64 zdot kernel #2222

Merged

martin-frbg added a commit that referenced this issue Aug 15, 2019

Add multithreading support to the x86_64 zdot kernel (#2222)

9ef96b3

* Add multithreading support copied from the ThunderX2T99 kernel. For #2221

martin-frbg closed this as completed Oct 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithread complex dot product #2221

Multithread complex dot product #2221

antoine-levitt commented Aug 14, 2019

martin-frbg commented Aug 14, 2019

antoine-levitt commented Aug 15, 2019

martin-frbg commented Aug 15, 2019

brada4 commented Aug 15, 2019

antoine-levitt commented Aug 15, 2019

brada4 commented Aug 15, 2019

martin-frbg commented Oct 24, 2019

Multithread complex dot product #2221

Multithread complex dot product #2221

Comments

antoine-levitt commented Aug 14, 2019

martin-frbg commented Aug 14, 2019

antoine-levitt commented Aug 15, 2019

martin-frbg commented Aug 15, 2019

brada4 commented Aug 15, 2019

antoine-levitt commented Aug 15, 2019

brada4 commented Aug 15, 2019

martin-frbg commented Oct 24, 2019