Skip to content

Benchmarks with Julia v1.5 #8

@oschulz

Description

@oschulz

Julia v1.5 enables inline allocation of structs with pointers (JuliaLang/julia#34126), this should make UnsafeArrays unnecessary in most cases. New benchmarks - using the test case

using Base.Threads, LinearAlgebra
using UnsafeArrays
using BenchmarkTools

function colnorms!(dest::AbstractVector, A::AbstractMatrix)
    @threads for i in axes(A, 2)
        dest[i] = norm(view(A, :, i))
    end
    dest
end

A = rand(50, 10^5);
dest = similar(A, size(A, 2));

colnorms!(dest, A)

With Julia v1.4:

julia> nthreads()
64

julia> @benchmark colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  4.62 MiB
  allocs estimate:  100323
  --------------
  minimum time:     256.291 μs (0.00% GC)
  median time:      623.428 μs (0.00% GC)
  mean time:        10.020 ms (93.82% GC)
  maximum time:     3.567 s (99.97% GC)
  --------------
  samples:          758
  evals/sample:     1

julia> @benchmark @uviews A colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  45.63 KiB
  allocs estimate:  324
  --------------
  minimum time:     227.121 μs (0.00% GC)
  median time:      249.831 μs (0.00% GC)
  mean time:        262.351 μs (1.26% GC)
  maximum time:     4.043 ms (85.49% GC)
  --------------
  samples:          10000
  evals/sample:     1

With Julia v1.5-beta1:

julia> nthreads()
64

julia> @benchmark colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  46.61 KiB
  allocs estimate:  321
  --------------
  minimum time:     135.311 μs (0.00% GC)
  median time:      156.681 μs (0.00% GC)
  mean time:        166.511 μs (2.80% GC)
  maximum time:     5.915 ms (89.80% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark @uviews A colnorms!(dest, A)
BenchmarkTools.Trial: 
  memory estimate:  46.66 KiB
  allocs estimate:  322
  --------------
  minimum time:     126.701 μs (0.00% GC)
  median time:      140.041 μs (0.00% GC)
  mean time:        150.547 μs (2.48% GC)
  maximum time:     5.952 ms (90.35% GC)
  --------------
  samples:          10000
  evals/sample:     1

Very little difference in the mean runtime with and without @uviews - in contrast to v1.4, where we see a strong difference. Also, a very nice gain in speed in general.

Test system: AMD EPYC 7702P 64-core CPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions