-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Julia v1.5 enables inline allocation of structs with pointers (JuliaLang/julia#34126), this should make UnsafeArrays
unnecessary in most cases. New benchmarks - using the test case
using Base.Threads, LinearAlgebra
using UnsafeArrays
using BenchmarkTools
function colnorms!(dest::AbstractVector, A::AbstractMatrix)
@threads for i in axes(A, 2)
dest[i] = norm(view(A, :, i))
end
dest
end
A = rand(50, 10^5);
dest = similar(A, size(A, 2));
colnorms!(dest, A)
With Julia v1.4:
julia> nthreads()
64
julia> @benchmark colnorms!(dest, A)
BenchmarkTools.Trial:
memory estimate: 4.62 MiB
allocs estimate: 100323
--------------
minimum time: 256.291 μs (0.00% GC)
median time: 623.428 μs (0.00% GC)
mean time: 10.020 ms (93.82% GC)
maximum time: 3.567 s (99.97% GC)
--------------
samples: 758
evals/sample: 1
julia> @benchmark @uviews A colnorms!(dest, A)
BenchmarkTools.Trial:
memory estimate: 45.63 KiB
allocs estimate: 324
--------------
minimum time: 227.121 μs (0.00% GC)
median time: 249.831 μs (0.00% GC)
mean time: 262.351 μs (1.26% GC)
maximum time: 4.043 ms (85.49% GC)
--------------
samples: 10000
evals/sample: 1
With Julia v1.5-beta1:
julia> nthreads()
64
julia> @benchmark colnorms!(dest, A)
BenchmarkTools.Trial:
memory estimate: 46.61 KiB
allocs estimate: 321
--------------
minimum time: 135.311 μs (0.00% GC)
median time: 156.681 μs (0.00% GC)
mean time: 166.511 μs (2.80% GC)
maximum time: 5.915 ms (89.80% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark @uviews A colnorms!(dest, A)
BenchmarkTools.Trial:
memory estimate: 46.66 KiB
allocs estimate: 322
--------------
minimum time: 126.701 μs (0.00% GC)
median time: 140.041 μs (0.00% GC)
mean time: 150.547 μs (2.48% GC)
maximum time: 5.952 ms (90.35% GC)
--------------
samples: 10000
evals/sample: 1
Very little difference in the mean runtime with and without @uviews
- in contrast to v1.4, where we see a strong difference. Also, a very nice gain in speed in general.
Test system: AMD EPYC 7702P 64-core CPU.
KristofferC, chriselrod, ericphanson, StefanKarpinski, Roger-luo and 7 more
Metadata
Metadata
Assignees
Labels
No labels