Skip to content

Severe compile-time regression for some tuple-heavy code #59134

@martinholters

Description

@martinholters

I've noticed that precompiling Keccak v0.1.0 takes about 10-12 s on Julia 1.10 and 1.11, but then on 1.12-rc1 it takes 75 s and on 1.13.0-DEV.922 it takes unacceptable 400 s. Zooming in on the problem, I've come to the following reduced example:

using Chairmarks

@inline function g(data, srcoffset, destoffset)
    ntuple(Val(168)) do i
        j = i-1-destoffset+srcoffset+firstindex(data)
        return !checkindex(Bool, eachindex(data), j) ? 0x00 : data[j] # A
        #return checkindex(Bool, eachindex(data), j) ? data[j] : 0x00 # B
    end
end

function f(st::NTuple{25,UInt64}, k, data)
    n = 0
    if k != 0
        block = reinterpret(NTuple{21,UInt64}, g(data, 0, k))
        st = let st=st, block=block
            ntuple(l -> l <= 21 ? st[l]  block[l] : st[l], Val(25))
        end
        n += min(length(data), 168-k)
    end
    if n < length(data)
        block = reinterpret(NTuple{21,UInt64}, g(data, n, 0))
        st = let st=st, block=block
            ntuple(l -> l <= 21 ? st[l]  block[l] : st[l], Val(25))
        end
    end
    return st
end

@show VERSION
@time f(ntuple(_ -> zero(UInt64), Val(25)), 0, (0x01, 0x00)) # compile-time
display(@b ntuple(_ -> zero(UInt64), Val(25)) f(_, 0, (0x01, 0x00))) # run-time

Running with different versions of Julia, I get:

VERSION = v"1.11.6"
  0.413270 seconds (556.00 k allocations: 23.097 MiB, 99.98% compilation time)
4.793 ns

VERSION = v"1.12.0-rc1"
  5.217990 seconds (652.16 k allocations: 26.707 MiB, 100.00% compilation time)
91.597 ns

VERSION = v"1.13.0-DEV.922"
 81.222883 seconds (544.92 k allocations: 23.443 MiB, 100.00% compilation time)
60.671 ns

We notice a severe compile-time regression 0.4 s -> 5.2 s -> 81 s. Sure, largish tuples like here can be hard on the compiler, but v1.11 shows that it's possible to deal with it. Looking at the run-time, v1.11 even produces better code! Although in comparison, I'm less worried about this run-time regression, as this micro-benchmark may not be too relevant in practice. (Looks to me as though v1.11 does more constant-propagation, but that's just a guess.)

This good news is that replacing line A with line B in g, things look much better:

VERSION = v"1.11.6"
  0.373411 seconds (530.69 k allocations: 22.351 MiB, 99.98% compilation time)
4.900 ns

VERSION = v"1.12.0-rc1"
  1.289995 seconds (622.06 k allocations: 25.875 MiB, 99.99% compilation time)
53.767 ns

VERSION = v"1.13.0-DEV.922"
  1.191754 seconds (521.54 k allocations: 22.756 MiB, 99.99% compilation time)
59.750 ns

Doing the corresponding change in Keccak, precompilation is back to 10-12 s on 1.12-rc1 and 1.13.0-DEV.922. Therefore, this issue is not as pressing for me as I thought it was at first. But the fact the switching between lines A an B makes a 200-fold difference in compile-time is quite concerning to me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    latencyLatencyperformanceMust go fasterregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions