Skip to content

Julep: solving tricky iteration problems #15648

Closed
@timholy

Description

@timholy

The problem (a brief summary)

In a PR for a blog post, I described a number of challenges for several potential new AbstractArray types. Very briefly, the types and the challenges they exhibit are:

  • ReshapedArrays: the only efficient way of indexing some types is by indexing the parent array directly (i.e., with an iterator optimized for the pre-reshaping array). This means that indexing a 2D array as B[i,j], for integer i and j, may not always be efficient.
  • TransposedMatrix (for Taking vector transposes seriously LinearAlgebra.jl#42): iteration is most efficient when it produces a cache-friendly sequence of array accesses. Unfortunately, algorithms that mix column-major and row-major arrays imply that we can't naively allow eachindex(A) to always return the most efficient iterator.
  • OffsetArrays (sometimes called "Fortran arrays"): these violate the assumptions that array indexes run from 1:size(A,d), and more generally question how one should match elements of two different
    arrays.

Possible APIs

In the blog post, only two concrete algorithms were considered: matrix-vector multiplication and copy!. At the risk of being insufficiently general, let's explore possible APIs in terms of these two functions.

EDIT: note that the thoughts on the API are evolving as a consequence of this discussion. To jump ahead to an updated proposal, see #15648 (comment). There may be further updates even later. The material below is left for reference, but some of it is outdated.

The "raw" API

For the moment, let's worry about correctness and efficiency more than elegance. Correctness trumps all other considerations. Efficiency means that we have to be able to iterate over the matrix in
cache-friendly order. That means that the array B has to be "in charge" of the sequence of values and pick its most efficient iterator.

I should emphasize that I haven't taken even the first step towards attempting to implement this API. So this is crystal-ball development, trying to imagine what might work while hoping to shake
out likely problems in advance via community feedback. That caveat understood, here's a possible implementation of matrix-vector multiplication:

RB = eachindex(B)
Rv = eachindex(RB[*,:], v)
Rdest = eachindex(RB[:,*], dest)
for (idest, iB, iv) in zip(Rdest, RB, Rv)
    dest[idest] += B[iB] * v[iv]
end

eachindex(iter, obj) should perform bounds-checking, to ensure that the indexes of iter align with obj (even handling things like OffsetArrays). The notation RB[*,:] means "corresponding to the
second dimension of RB"; interestingly, this expression parses without error, attempting to dispatch to getindex with argument types Tuple{typeof(RB), Function, Colon}. Hopefully there isn't another pending meaning for indexing with a Function.

Because either idest or iv is constant over the "inner" loop (depending on storage order of B), ideally one would like to hoist the corresponding access dest[idest] or v[iv]. I'd argue that's a
job for the compiler, and not something we should have to bake into the API.

Because zip(X, Y) does not allow Y to see the state of X, it seems quite possible that this could not be made efficient in general (consider the case described for ReshapedArrays, where the
efficiently-parametrized iterator for the first dimension depends on the state of the index for the second dimension). If this proves to be the case, we have to abandon zip and write this more like

RB = eachindex(B)
for (iB, idest, iv) in couple(RB, (RB[:,*], dest), (RB[*,:], v))
    dest[idest] += B[iB] * v[iv]
end

where couple is a new exported function.

One interesting thing about this API is that sparse multiplication might be implemented efficiently just by replacing the first line with

RB = eachstored(B)

so that only "stored" elements of B are visited. (@tkelman may have been the first to propose an eachstored iterator.) It should be pointed out that several recent discussions have addressed some of the complications that arise from Inf and NaN, and it remains to be determined whether a version that doesn't ignore these complications can be written with this (or similar) API.

Likewise, copy! might be implemented

Rdest = eachindex(dest)
for (idest, isrc) in couple(Rdest, (Rdest, src))
    dest[idest] = src[isrc]
end

At least for the array types described here, this API seems plausibly capable of generating both correct and (fairly) efficient code. One might ideally want to write generic "tiled" implementations, but that seems beyond the scope of this initial effort.

Making it prettier with "sufficiently smart iteration"

This API is still more complicated than naive iteration. One might possibly be able to hide a lot of this with a magical @iterate macro,

@iterate B dest[i] += B[i,j]*v[j]

for matrix-vector multiplication and

@iterate dest dest[i] = src[i]

for copy!. The idea is that the @iterate macro would expand these expressions into their longer forms above. Note that the first argument to @iterate specifies which object is "in charge" of the
sequence of operations. @iterate might be too limited for many things, but may be worth exploring. The KernelTools repository may have some useful tidbits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    julepJulia Enhancement Proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions