Description
The problem (a brief summary)
In a PR for a blog post, I described a number of challenges for several potential new AbstractArray
types. Very briefly, the types and the challenges they exhibit are:
ReshapedArrays
: the only efficient way of indexing some types is by indexing the parent array directly (i.e., with an iterator optimized for the pre-reshaping array). This means that indexing a 2D array asB[i,j]
, for integeri
andj
, may not always be efficient.TransposedMatrix
(for Taking vector transposes seriously LinearAlgebra.jl#42): iteration is most efficient when it produces a cache-friendly sequence of array accesses. Unfortunately, algorithms that mix column-major and row-major arrays imply that we can't naively alloweachindex(A)
to always return the most efficient iterator.OffsetArrays
(sometimes called "Fortran arrays"): these violate the assumptions that array indexes run from1:size(A,d)
, and more generally question how one should match elements of two different
arrays.
Possible APIs
In the blog post, only two concrete algorithms were considered: matrix-vector multiplication and copy!
. At the risk of being insufficiently general, let's explore possible APIs in terms of these two functions.
EDIT: note that the thoughts on the API are evolving as a consequence of this discussion. To jump ahead to an updated proposal, see #15648 (comment). There may be further updates even later. The material below is left for reference, but some of it is outdated.
The "raw" API
For the moment, let's worry about correctness and efficiency more than elegance. Correctness trumps all other considerations. Efficiency means that we have to be able to iterate over the matrix in
cache-friendly order. That means that the array B
has to be "in charge" of the sequence of values and pick its most efficient iterator.
I should emphasize that I haven't taken even the first step towards attempting to implement this API. So this is crystal-ball development, trying to imagine what might work while hoping to shake
out likely problems in advance via community feedback. That caveat understood, here's a possible implementation of matrix-vector multiplication:
RB = eachindex(B)
Rv = eachindex(RB[*,:], v)
Rdest = eachindex(RB[:,*], dest)
for (idest, iB, iv) in zip(Rdest, RB, Rv)
dest[idest] += B[iB] * v[iv]
end
eachindex(iter, obj)
should perform bounds-checking, to ensure that the indexes of iter
align with obj
(even handling things like OffsetArrays
). The notation RB[*,:]
means "corresponding to the
second dimension of RB
"; interestingly, this expression parses without error, attempting to dispatch to getindex
with argument types Tuple{typeof(RB), Function, Colon}
. Hopefully there isn't another pending meaning for indexing with a Function
.
Because either idest
or iv
is constant over the "inner" loop (depending on storage order of B
), ideally one would like to hoist the corresponding access dest[idest]
or v[iv]
. I'd argue that's a
job for the compiler, and not something we should have to bake into the API.
Because zip(X, Y)
does not allow Y
to see the state of X
, it seems quite possible that this could not be made efficient in general (consider the case described for ReshapedArray
s, where the
efficiently-parametrized iterator for the first dimension depends on the state of the index for the second dimension). If this proves to be the case, we have to abandon zip
and write this more like
RB = eachindex(B)
for (iB, idest, iv) in couple(RB, (RB[:,*], dest), (RB[*,:], v))
dest[idest] += B[iB] * v[iv]
end
where couple
is a new exported function.
One interesting thing about this API is that sparse multiplication might be implemented efficiently just by replacing the first line with
RB = eachstored(B)
so that only "stored" elements of B
are visited. (@tkelman may have been the first to propose an eachstored
iterator.) It should be pointed out that several recent discussions have addressed some of the complications that arise from Inf
and NaN
, and it remains to be determined whether a version that doesn't ignore these complications can be written with this (or similar) API.
Likewise, copy!
might be implemented
Rdest = eachindex(dest)
for (idest, isrc) in couple(Rdest, (Rdest, src))
dest[idest] = src[isrc]
end
At least for the array types described here, this API seems plausibly capable of generating both correct and (fairly) efficient code. One might ideally want to write generic "tiled" implementations, but that seems beyond the scope of this initial effort.
Making it prettier with "sufficiently smart iteration"
This API is still more complicated than naive iteration. One might possibly be able to hide a lot of this with a magical @iterate
macro,
@iterate B dest[i] += B[i,j]*v[j]
for matrix-vector multiplication and
@iterate dest dest[i] = src[i]
for copy!
. The idea is that the @iterate
macro would expand these expressions into their longer forms above. Note that the first argument to @iterate
specifies which object is "in charge" of the
sequence of operations. @iterate
might be too limited for many things, but may be worth exploring. The KernelTools repository may have some useful tidbits.