push!/pop! always does a ccall

When looking over my code_native, I recently saw that every push! costs me a call into the runtime library. This appears...excessive, at least for the standard case where the push can be accommodated without copying.

So, lets compare the push! and the fastpush!

First, lets look at what a vector actually is:

```
struct array_T
    data :: Ptr{Void}
    length:: UInt64
    flags::UInt16
    elsize::UInt16
    offset::UInt32
    nrows::UInt64
    ncols_maxsz::UInt64
    stuff1::UInt64
    stuff2::UInt64
    stuff3::UInt64
end
```

This gives us the fastpush:

```
@inline function fastpush!(A::Vector{T},val::T) where T
    if isbits(T)
        arr = unsafe_load(convert(Ptr{array_T}, pointer_from_objref(A)))
        if arr.length + arr.offset  < arr.ncols_maxsz       
            unsafe_store!(convert(Ptr{UInt64}, pointer_from_objref(A)), arr.length+1, 2)
            unsafe_store!(convert(Ptr{UInt64}, pointer_from_objref(A)), arr.length+1, 4)
            @inbounds A[arr.length+1] = val
        else
            push!(A,val)
        end
    else
        push!(A,val)
    end
    A
end
```

And lets run a race:

```
function pushtest(A,n)
    resize!(A, 0);
    for i = 1:n push!(A,i) end
    nothing
end

function fastpushtest(A,n)
    resize!(A, 0)
    for i = 1:n fastpush!(A,i) end
    nothing
end

function settest(A,n)
    resize!(A, n)
    for i = 1:n  A[i]=i end
    nothing
end
```
yielding:

```
Nm = 100_000_000; A = Vector{Int}(Nm); resize!(A, 0);
#after warmup:

@time settest(A,Nm); @time settest(A,Nm);
  0.356040 seconds (4 allocations: 160 bytes)
  0.326640 seconds (4 allocations: 160 bytes)

@time pushtest(A,Nm); @time pushtest(A,Nm);
  0.899436 seconds (4 allocations: 160 bytes)
  0.895555 seconds (4 allocations: 160 bytes)

@time fastpushtest(A,Nm); @time fastpushtest(A,Nm);
  0.577715 seconds (4 allocations: 160 bytes)
  0.606671 seconds (4 allocations: 160 bytes)
```

Hence, a 30% speedup is possible if we manage to get the fastpath inlined.

It would mayhaps be nice if it was somehow possible to make llvm aware of julia's runtime lib. For example, compile to llvm_IR (using clang or dragonegg?), and call into a nice library of llvm code including optimization-relevant metadata, and possibly inline current ccall-functions.

Not sure whether this performance problem is worth trying to fix (and how much more expensive it is if we lose important registers on the call). It is easy to avoid, though, by implementing the checks by hand (for tight loops that need to push a lot). In a perfect world, the optimizer would reduce to only a single check for unrolled pushing loops. In my applications, I do enough other things between push!/pop! that this ends up being benign.

PS. This is related to [https://github.com/JuliaLang/julia/pull/24901](https://github.com/JuliaLang/julia/pull/24901), which removes the boundcheck from push!/pop!.

I also posted this in discourse.julialang.org before. The fastpush is deliberately not a pull request, because it is dirty and architecture dependent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

push!/pop! always does a ccall #24909

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

push!/pop! always does a ccall #24909

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions