Skip to content

Unexpected performance difference between two similar loops #37473

@nalimilan

Description

@nalimilan

In the following example, count_missing is about 3 times slower than count_nonmissing:

function count_missing(x)
    c = 0
    @inbounds for i in eachindex(x)
        c += ismissing(x[i])
    end
    return c
end

function count_nonmissing(x)
    c = 0
    @inbounds for i in eachindex(x)
        c += !ismissing(x[i])
    end
    return c
end

x = rand([missing, rand(Int, 100)...], 1_000_000);

@assert count_missing(x) == length(x) - count_nonmissing(x)

using BenchmarkTools

julia> @btime count_missing(x);
  252.111 μs (1 allocation: 16 bytes)

julia> @btime count_nonmissing(x);
  85.691 μs (1 allocation: 16 bytes)

The only difference in the generated LLVM code is this part:

  • count_missing:
  %28 = xor <4 x i8> %wide.load, <i8 1, i8 1, i8 1, i8 1>
  %29 = xor <4 x i8> %wide.load20, <i8 1, i8 1, i8 1, i8 1>
  %30 = xor <4 x i8> %wide.load21, <i8 1, i8 1, i8 1, i8 1>
  %31 = xor <4 x i8> %wide.load22, <i8 1, i8 1, i8 1, i8 1>
  %32 = zext <4 x i8> %28 to <4 x i64>
  %33 = zext <4 x i8> %29 to <4 x i64>
  %34 = zext <4 x i8> %30 to <4 x i64>
  %35 = zext <4 x i8> %31 to <4 x i64>
; ┌ @ int.jl:923 within `+' @ int.jl:87
   %36 = add <4 x i64> %vec.phi, %32
   %37 = add <4 x i64> %vec.phi17, %33
   %38 = add <4 x i64> %vec.phi18, %34
   %39 = add <4 x i64> %vec.phi19, %35
   %index.next = add i64 %index, 16
   %40 = icmp eq i64 %index.next, %n.vec
   br i1 %40, label %middle.block, label %vector.body
  • count_nonmissing:
  %28 = zext <4 x i8> %wide.load to <4 x i64>
  %29 = zext <4 x i8> %wide.load20 to <4 x i64>
  %30 = zext <4 x i8> %wide.load21 to <4 x i64>
  %31 = zext <4 x i8> %wide.load22 to <4 x i64>
; ┌ @ int.jl:923 within `+' @ int.jl:87
   %32 = add <4 x i64> %vec.phi, %28
   %33 = add <4 x i64> %vec.phi17, %29
   %34 = add <4 x i64> %vec.phi18, %30
   %35 = add <4 x i64> %vec.phi19, %31
   %index.next = add i64 %index, 16
   %36 = icmp eq i64 %index.next, %n.vec
   br i1 %36, label %middle.block, label %vector.body

Is there anything that can be done to avoid this?

(Spotted in this Stack Overflow thread.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrays[a, r, r, a, y, s]missing dataBase.missing and related functionalityperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions