-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
Splatting has a known penalty, but today I looked into it a bit more carefully and I wonder if there's an easy fix for part of the problem. For those who don't like @generated
functions, this might be a good opportunity to reduce their numbers, since I think my major use for them now is to avoid the splatting penalty.
First the demo:
@noinline function bar1(A, xs...)
A[xs...]
end
@inline function bar2(A, xs...)
A[xs...]
end
function call_bar1a(A, n, xs...)
s = zero(eltype(A))
for i = 1:n
s += bar1(A, xs...)
end
s
end
function call_bar2a(A, n, xs...)
s = zero(eltype(A))
for i = 1:n
s += bar2(A, xs...)
end
s
end
@generated function call_bar1b(A, n, xs...)
xargs = [:(xs[$d]) for d = 1:length(xs)]
quote
s = zero(eltype(A))
for i = 1:n
s += bar1(A, $(xargs...))
end
s
end
end
@generated function call_bar2b(A, n, xs...)
xargs = [:(xs[$d]) for d = 1:length(xs)]
quote
s = zero(eltype(A))
for i = 1:n
s += bar2(A, $(xargs...))
end
s
end
end
A = rand(3,3,3)
call_bar1a(A, 1, 1, 2, 3)
call_bar2a(A, 1, 1, 2, 3)
call_bar1b(A, 1, 1, 2, 3)
call_bar2b(A, 1, 1, 2, 3)
@time 1
@time call_bar1a(A, 10^6, 1, 2, 3)
@time call_bar2a(A, 10^6, 1, 2, 3)
@time call_bar1b(A, 10^6, 1, 2, 3)
@time call_bar2b(A, 10^6, 1, 2, 3)
Results:
julia> include("/tmp/test_splat.jl")
0.000001 seconds (3 allocations: 144 bytes)
0.760468 seconds (7.00 M allocations: 137.329 MB, 3.72% gc time)
0.761979 seconds (7.00 M allocations: 137.329 MB, 3.62% gc time)
0.563945 seconds (5.00 M allocations: 106.812 MB, 4.30% gc time)
0.003080 seconds (6 allocations: 192 bytes)
You can see there's an absolutely enormous, deadly penalty for any kind of splatting. Since I write a lot of code that has to work in arbitrary dimensions, it's a major contributor to why I tend to write so many @generated
functions.
Now here's the fun part: look at the difference in @code_typed
for call_bar2a
and call_bar2b
(as a screenshot so you can see the colors):
I think the only difference is top(getfield)
vs Base.getfield
. (EDIT: I deleted the line number annotations to reduce the size of this diff.)