Skip to content

Julep: Generalize indexing with and by Associatives #24019

@andyferris

Description

@andyferris

I've always loved how in Julia (and MATLAB) one can create a new array from an old one, using what is now the APL indexing rules. Basically if you index a collection of values with a collection of indices, you get a new collection of the indexed values. Beautiful, simple. Indexing has also been extended by allowing arrays that don't use 1-based indexing by e.g. the OffsetArrays.jl package.

I'm not sure if this issue exists elsewhere as its own entity (cleaning up distinctions between arrays and associatives was surely mentioned in #20402 and this Julep seems to be a logical extension of #22907), but here I propose specifically that we extend indexing of and by Associative and make related changes so that the semantics are consistent across these two types of container. I prototyped ideas at https://github.com/andyferris/AssociativeArray.jl and basically came up with the ability to (with simple code):

  • Index an Associative{K,V} with an Associative{I,K} to get an Associative{I,V}. E.g. Dict(:a=>1, :b=>2, c:=>3)[Dict("a"=>:a, "c"=>:c)] == Dict("a"=>1, "c"=>3).
  • Index an Associative{K,V} with an AbstractArray{K,N} to get an AbstractArray{V,N}. E.g. Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1].
  • Index an AbstractArray{T,N} with an Associative{K,I} to get an Associative{K,T} (where I might be Int for linear indexing, or a CartesianIndex{N} for Cartesian indexing). E.g. [11,12,13][Dict(:a=>1, :c=>3)] == Dict(:a=>11, :c=>13).

The semantics are consistent across arrays and dictionaries, and provide that for out = a[b]:

  • The output container out shares the indices of b (note: these are CartesianRange for arrays)
  • The values out[i] correspond to a[b[i]].

This is fully consistent with both the Base arrays and the OffsetArrays.jl package (We can do something similar for setindex!).

To make everything consistent, it helps to make the following associated changes:

  • Make Associatives be containers of values, not of index=>value pairs, so that arrays and dictionaries are consistent on this fundamental point. Use the existing pairs function when necessary (and ideally make it preserve indexability).
  • Make similar always return a container with the same indices, even for dictionaries. Ideally, unify similar across Associatives and Arrays (for example a dictionary which is similar to a distributed array might also be distributed) via use of the indices.
  • Have an new empty function that makes empty Dicts and Vectors to which elements should be added. (Done, Add empty and change similar(::Associative) #24390).
  • Consider whether we want to have collection of things you call getindex and setindex! with be called indices, rather than keys (and rename the current indices(::AbstractArray) to something else)
  • Have view work for the various combinations where getindex works.

The demonstration package also prototypes making AbstractArray{T, N} <: Associative{CartesianIndex{N}, T} - I don't think this is strictly necessary but it helped (me) to highlight which parts of the existing interface were inconsistent. The package does demonstrate that we can put something simple together without excessive amounts of code (some performance tuning is surely required).

Finally, a word on what motivates this: lately I've been playing with what fundamental data operations (such as mapping, grouping, joining or filtering) would be useful for both generic data structures and tables/dataframes (that iterate rows), and I found whenever I created say a grouping (using a dictionary of groups), I immediately felt the loss of ability to do complex indexing and other operations with the result (as well have to worry whether the output iterates values or key-value pairs, etc).

Metadata

Metadata

Assignees

No one assigned

    Labels

    collectionsData structures holding multiple items, e.g. setsdesignDesign of APIs or of the language itselfjulepJulia Enhancement Proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions