-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
I've noticed that precompiling Keccak v0.1.0 takes about 10-12 s on Julia 1.10 and 1.11, but then on 1.12-rc1 it takes 75 s and on 1.13.0-DEV.922 it takes unacceptable 400 s. Zooming in on the problem, I've come to the following reduced example:
using Chairmarks
@inline function g(data, srcoffset, destoffset)
ntuple(Val(168)) do i
j = i-1-destoffset+srcoffset+firstindex(data)
return !checkindex(Bool, eachindex(data), j) ? 0x00 : data[j] # A
#return checkindex(Bool, eachindex(data), j) ? data[j] : 0x00 # B
end
end
function f(st::NTuple{25,UInt64}, k, data)
n = 0
if k != 0
block = reinterpret(NTuple{21,UInt64}, g(data, 0, k))
st = let st=st, block=block
ntuple(l -> l <= 21 ? st[l] ⊻ block[l] : st[l], Val(25))
end
n += min(length(data), 168-k)
end
if n < length(data)
block = reinterpret(NTuple{21,UInt64}, g(data, n, 0))
st = let st=st, block=block
ntuple(l -> l <= 21 ? st[l] ⊻ block[l] : st[l], Val(25))
end
end
return st
end
@show VERSION
@time f(ntuple(_ -> zero(UInt64), Val(25)), 0, (0x01, 0x00)) # compile-time
display(@b ntuple(_ -> zero(UInt64), Val(25)) f(_, 0, (0x01, 0x00))) # run-time
Running with different versions of Julia, I get:
VERSION = v"1.11.6"
0.413270 seconds (556.00 k allocations: 23.097 MiB, 99.98% compilation time)
4.793 ns
VERSION = v"1.12.0-rc1"
5.217990 seconds (652.16 k allocations: 26.707 MiB, 100.00% compilation time)
91.597 ns
VERSION = v"1.13.0-DEV.922"
81.222883 seconds (544.92 k allocations: 23.443 MiB, 100.00% compilation time)
60.671 ns
We notice a severe compile-time regression 0.4 s -> 5.2 s -> 81 s. Sure, largish tuples like here can be hard on the compiler, but v1.11 shows that it's possible to deal with it. Looking at the run-time, v1.11 even produces better code! Although in comparison, I'm less worried about this run-time regression, as this micro-benchmark may not be too relevant in practice. (Looks to me as though v1.11 does more constant-propagation, but that's just a guess.)
This good news is that replacing line A with line B in g
, things look much better:
VERSION = v"1.11.6"
0.373411 seconds (530.69 k allocations: 22.351 MiB, 99.98% compilation time)
4.900 ns
VERSION = v"1.12.0-rc1"
1.289995 seconds (622.06 k allocations: 25.875 MiB, 99.99% compilation time)
53.767 ns
VERSION = v"1.13.0-DEV.922"
1.191754 seconds (521.54 k allocations: 22.756 MiB, 99.99% compilation time)
59.750 ns
Doing the corresponding change in Keccak, precompilation is back to 10-12 s on 1.12-rc1 and 1.13.0-DEV.922. Therefore, this issue is not as pressing for me as I thought it was at first. But the fact the switching between lines A an B makes a 200-fold difference in compile-time is quite concerning to me.