Skip to content

Call-site splatting optimization #13359

@timholy

Description

@timholy

Splatting has a known penalty, but today I looked into it a bit more carefully and I wonder if there's an easy fix for part of the problem. For those who don't like @generated functions, this might be a good opportunity to reduce their numbers, since I think my major use for them now is to avoid the splatting penalty.

First the demo:

@noinline function bar1(A, xs...)
    A[xs...]
end

@inline function bar2(A, xs...)
    A[xs...]
end

function call_bar1a(A, n, xs...)
    s = zero(eltype(A))
    for i = 1:n
        s += bar1(A, xs...)
    end
    s
end

function call_bar2a(A, n, xs...)
    s = zero(eltype(A))
    for i = 1:n
        s += bar2(A, xs...)
    end
    s
end

@generated function call_bar1b(A, n, xs...)
    xargs = [:(xs[$d]) for d = 1:length(xs)]
    quote
        s = zero(eltype(A))
        for i = 1:n
            s += bar1(A, $(xargs...))
        end
        s
    end
end

@generated function call_bar2b(A, n, xs...)
    xargs = [:(xs[$d]) for d = 1:length(xs)]
    quote
        s = zero(eltype(A))
        for i = 1:n
            s += bar2(A, $(xargs...))
        end
        s
    end
end

A = rand(3,3,3)
call_bar1a(A, 1, 1, 2, 3)
call_bar2a(A, 1, 1, 2, 3)
call_bar1b(A, 1, 1, 2, 3)
call_bar2b(A, 1, 1, 2, 3)

@time 1
@time call_bar1a(A, 10^6, 1, 2, 3)
@time call_bar2a(A, 10^6, 1, 2, 3)
@time call_bar1b(A, 10^6, 1, 2, 3)
@time call_bar2b(A, 10^6, 1, 2, 3)

Results:

julia> include("/tmp/test_splat.jl")
  0.000001 seconds (3 allocations: 144 bytes)
  0.760468 seconds (7.00 M allocations: 137.329 MB, 3.72% gc time)
  0.761979 seconds (7.00 M allocations: 137.329 MB, 3.62% gc time)
  0.563945 seconds (5.00 M allocations: 106.812 MB, 4.30% gc time)
  0.003080 seconds (6 allocations: 192 bytes)

You can see there's an absolutely enormous, deadly penalty for any kind of splatting. Since I write a lot of code that has to work in arbitrary dimensions, it's a major contributor to why I tend to write so many @generated functions.

Now here's the fun part: look at the difference in @code_typed for call_bar2a and call_bar2b (as a screenshot so you can see the colors):
image

I think the only difference is top(getfield) vs Base.getfield. (EDIT: I deleted the line number annotations to reduce the size of this diff.)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions