Skip to content

push!/pop! always does a ccall #24909

@chethega

Description

@chethega

When looking over my code_native, I recently saw that every push! costs me a call into the runtime library. This appears...excessive, at least for the standard case where the push can be accommodated without copying.

So, lets compare the push! and the fastpush!

First, lets look at what a vector actually is:

struct array_T
    data :: Ptr{Void}
    length:: UInt64
    flags::UInt16
    elsize::UInt16
    offset::UInt32
    nrows::UInt64
    ncols_maxsz::UInt64
    stuff1::UInt64
    stuff2::UInt64
    stuff3::UInt64
end

This gives us the fastpush:

@inline function fastpush!(A::Vector{T},val::T) where T
    if isbits(T)
        arr = unsafe_load(convert(Ptr{array_T}, pointer_from_objref(A)))
        if arr.length + arr.offset  < arr.ncols_maxsz       
            unsafe_store!(convert(Ptr{UInt64}, pointer_from_objref(A)), arr.length+1, 2)
            unsafe_store!(convert(Ptr{UInt64}, pointer_from_objref(A)), arr.length+1, 4)
            @inbounds A[arr.length+1] = val
        else
            push!(A,val)
        end
    else
        push!(A,val)
    end
    A
end

And lets run a race:

function pushtest(A,n)
    resize!(A, 0);
    for i = 1:n push!(A,i) end
    nothing
end

function fastpushtest(A,n)
    resize!(A, 0)
    for i = 1:n fastpush!(A,i) end
    nothing
end

function settest(A,n)
    resize!(A, n)
    for i = 1:n  A[i]=i end
    nothing
end

yielding:

Nm = 100_000_000; A = Vector{Int}(Nm); resize!(A, 0);
#after warmup:

@time settest(A,Nm); @time settest(A,Nm);
  0.356040 seconds (4 allocations: 160 bytes)
  0.326640 seconds (4 allocations: 160 bytes)

@time pushtest(A,Nm); @time pushtest(A,Nm);
  0.899436 seconds (4 allocations: 160 bytes)
  0.895555 seconds (4 allocations: 160 bytes)

@time fastpushtest(A,Nm); @time fastpushtest(A,Nm);
  0.577715 seconds (4 allocations: 160 bytes)
  0.606671 seconds (4 allocations: 160 bytes)

Hence, a 30% speedup is possible if we manage to get the fastpath inlined.

It would mayhaps be nice if it was somehow possible to make llvm aware of julia's runtime lib. For example, compile to llvm_IR (using clang or dragonegg?), and call into a nice library of llvm code including optimization-relevant metadata, and possibly inline current ccall-functions.

Not sure whether this performance problem is worth trying to fix (and how much more expensive it is if we lose important registers on the call). It is easy to avoid, though, by implementing the checks by hand (for tight loops that need to push a lot). In a perfect world, the optimizer would reduce to only a single check for unrolled pushing loops. In my applications, I do enough other things between push!/pop! that this ends up being benign.

PS. This is related to #24901, which removes the boundcheck from push!/pop!.

I also posted this in discourse.julialang.org before. The fastpush is deliberately not a pull request, because it is dirty and architecture dependent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions