-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
I've always loved how in Julia (and MATLAB) one can create a new array from an old one, using what is now the APL indexing rules. Basically if you index a collection of values with a collection of indices, you get a new collection of the indexed values. Beautiful, simple. Indexing has also been extended by allowing arrays that don't use 1-based indexing by e.g. the OffsetArrays.jl package.
I'm not sure if this issue exists elsewhere as its own entity (cleaning up distinctions between arrays and associatives was surely mentioned in #20402 and this Julep seems to be a logical extension of #22907), but here I propose specifically that we extend indexing of and by Associative
and make related changes so that the semantics are consistent across these two types of container. I prototyped ideas at https://github.com/andyferris/AssociativeArray.jl and basically came up with the ability to (with simple code):
- Index an
Associative{K,V}
with anAssociative{I,K}
to get anAssociative{I,V}
. E.g.Dict(:a=>1, :b=>2, c:=>3)[Dict("a"=>:a, "c"=>:c)] == Dict("a"=>1, "c"=>3)
. - Index an
Associative{K,V}
with anAbstractArray{K,N}
to get anAbstractArray{V,N}
. E.g.Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1]
. - Index an
AbstractArray{T,N}
with anAssociative{K,I}
to get anAssociative{K,T}
(whereI
might beInt
for linear indexing, or aCartesianIndex{N}
for Cartesian indexing). E.g.[11,12,13][Dict(:a=>1, :c=>3)] == Dict(:a=>11, :c=>13)
.
The semantics are consistent across arrays and dictionaries, and provide that for out = a[b]
:
- The output container
out
shares the indices ofb
(note: these areCartesianRange
for arrays) - The values
out[i]
correspond toa[b[i]]
.
This is fully consistent with both the Base
arrays and the OffsetArrays.jl package (We can do something similar for setindex!
).
To make everything consistent, it helps to make the following associated changes:
- Make
Associative
s be containers of values, not ofindex=>value
pairs, so that arrays and dictionaries are consistent on this fundamental point. Use the existingpairs
function when necessary (and ideally make it preserve indexability). - Make
similar
always return a container with the same indices, even for dictionaries. Ideally, unifysimilar
acrossAssociative
s andArray
s (for example a dictionary which issimilar
to a distributed array might also be distributed) via use of the indices. - Have an new
empty
function that makes emptyDict
s andVector
s to which elements should be added. (Done, Addempty
and changesimilar(::Associative)
#24390). - Consider whether we want to have collection of things you call
getindex
andsetindex!
with be calledindices
, rather thankeys
(and rename the currentindices(::AbstractArray)
to something else) - Have
view
work for the various combinations wheregetindex
works.
The demonstration package also prototypes making AbstractArray{T, N} <: Associative{CartesianIndex{N}, T}
- I don't think this is strictly necessary but it helped (me) to highlight which parts of the existing interface were inconsistent. The package does demonstrate that we can put something simple together without excessive amounts of code (some performance tuning is surely required).
Finally, a word on what motivates this: lately I've been playing with what fundamental data operations (such as mapping, grouping, joining or filtering) would be useful for both generic data structures and tables/dataframes (that iterate rows), and I found whenever I created say a grouping (using a dictionary of groups), I immediately felt the loss of ability to do complex indexing and other operations with the result (as well have to worry whether the output iterates values or key-value pairs, etc).