-
-
Notifications
You must be signed in to change notification settings - Fork 23
eigvals
performs faster for Matrix{ComplexF64}
than Matrix{Float64}
on Windows
#960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
eigvals
performs faster for Matrix{ComplexF64}
than Matrix{Float64}
eigvals
performs faster for Matrix{ComplexF64}
than Matrix{Float64}
on Windows
Could you please try to time just |
Here are the results on the same Windows JuliaLang/julia#1 machine as above: |
So the problem is in the reduction to symmetric tridiagonal form. Could you please try again with |
Setting |
This looks like an issue with threading in OpenBLAS on Windows so it would be great if you could file the issue at https://github.com/xianyi/OpenBLAS. Usually, it's necessary to create a reproducer in Fortran or C before they can make progress on the issue. |
OK, let me see if I can reproduce this issue using the C interface of LAPACK bundled with OpenBLAS. |
Seems to be fixed going from 1.9.4 to v1.10-rc2. julia> @btime eigvals($F);
79.600 μs (11 allocations: 38.70 KiB) # 1.10-rc2
968.000 μs (11 allocations: 38.70 KiB) # 1.9.4
julia> @btime eigvals($C);
189.800 μs (15 allocations: 119.70 KiB) # 1.10-rc2
171.700 μs (15 allocations: 119.70 KiB) # 1.9.4
julia> @btime LAPACK.hetrd!('U', copy($F));
34.000 μs (7 allocations: 33.59 KiB) # 1.10-rc2
932.900 μs (7 allocations: 33.59 KiB) # 1.9.4
julia> @btime LAPACK.hetrd!('U', copy($C));
127.300 μs (7 allocations: 66.05 KiB) # 1.10-rc2
113.600 μs (7 allocations: 66.05 KiB) # 1.9.4 versioninfo: julia> versioninfo()
Julia Version 1.10.0-rc2
Commit dbb9c46795 (2023-12-03 15:25 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 11 on 8 virtual cores
Environment:
JULIA_CONDAPKG_BACKEND = Null
JULIA_NUM_THREADS = auto |
Indeed, I can replicate @wheeheee's results ( julia> @btime eigvals($F);
59.300 μs (11 allocations: 38.70 KiB) # 1.10-rc2
688.900 μs (11 allocations: 38.70 KiB) # 1.9.4
julia> @btime eigvals($C);
102.900 μs (15 allocations: 119.70 KiB) # 1.10-rc2
146.900 μs (15 allocations: 119.70 KiB) # 1.9.4
julia> @btime LAPACK.hetrd!('U', copy($F));
24.100 μs (7 allocations: 33.59 KiB) # 1.10-rc2
533.800 μs (7 allocations: 33.59 KiB) # 1.9.4
julia> @btime LAPACK.hetrd!('U', copy($C));
57.400 μs (7 allocations: 66.05 KiB) # 1.10-rc2
74.600 μs (7 allocations: 66.05 KiB) # 1.9.4 In my case, there is an apparent perfomance improvement for Also checked for julia> @btime eigvals($F);
2.700 ms (11 allocations: 498.77 KiB) # 1.10-rc2
7.615 ms (11 allocations: 498.77 KiB) # 1.9.4
julia> @btime eigvals($C);
7.704 ms (15 allocations: 1.80 MiB) # 1.10-rc2
7.803 ms (15 allocations: 1.80 MiB) # 1.9.4 versioninfojulia> versioninfo()
Julia Version 1.10.0-rc2
Commit dbb9c46795 (2023-12-03 15:25 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 × 12th Gen Intel(R) Core(TM) i7-1260P
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 23 on 16 virtual cores |
I have just checked that the released Julia 1.10.0 yields the same results as 1.10-rc2 (see above), so this is fixed and the issue can be closed. |
(Cross-posting from Discourse)
I’ve noticed that when diagonalising real symmetric matrices using the default OpenBLAS,
eigvals
may perform faster if the input matrix is complex, i.e.Matrix{ComplexF64}
rather thanMatrix{Float64}
. Here is my test code:For

n = 50
, the complex matrix is diagonalised ~5 times faster than the real one:For
n = 230
, both calculations take the same amount of time, and for larger matrices the complex calculation becomes slower than the real one, as expected.I could reproduce these results on four different machines running Windows 10, while on macOS 10.14.6 the issue is not present (the real calculation performs faster than the complex, as expected). The outputs of
versioninfo()
are available in a gist.The issue seems not to appear on Linux either, see Discourse.
When I switch to MKL.jl, the issue does not appear.
The text was updated successfully, but these errors were encountered: