You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/posts/flexible-indexing/index.md
+20-18Lines changed: 20 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,18 +2,18 @@
2
2
title: 'Flexible Indexes: Exciting new ways to slice and dice your data!'
3
3
date: '2025-08-11'
4
4
authors:
5
-
- name: Benoît Bovy
6
-
github: benbovy
7
5
- name: Scott Henderson
8
6
github: scottyhq
7
+
- name: Benoît Bovy
8
+
github: benbovy
9
9
- name: Deepak Cherian
10
10
github: dcherian
11
11
- name: Justus Magin
12
12
github: keewis
13
13
summary: 'An introduction to customizable coordinate-based data selection and alignment for more efficient handling of both traditional and more exotic data structures'
14
14
---
15
15
16
-
**TL;DR**: over the last few years Xarray has been through a gradual although major refactoring of its internals that makes coordinate-based data selection and alignment customizable. Xarray>=2025.6 now enables more efficient handling of both traditional and more exotic data structures. In this post we highlight a few examples that take advantage of this new superpower! See the [Gallery of Custom Index Examples](https://xarray-indexes.readthedocs.io/) for more!
16
+
**TL;DR**: Over the last few years we've gradually refactored Xarray internals to make coordinate-based data selection and alignment customizable. As a result, Xarray>=2025.6 enables more efficient handling of both traditional and more exotic data structures. In this post we highlight a few examples that take advantage of this new superpower! See the [Gallery of Custom Index Examples](https://xarray-indexes.readthedocs.io/) for more!
@@ -35,19 +35,19 @@ Examples of indexes are all around you and are a fundamental way to organize and
35
35
In the United States, if you want a book about Natural Sciences, you can go to your local library branch and head straight to section 500. Or if you're in the mood for a classic novel go to section 800. Connecting thematic labels with numbers (`{'Natural Sciences': 500, 'Literature': 800}`) is a classic indexing system that's been around for hundreds of years [(Dewey Decimal System, 1876)](https://en.wikipedia.org/wiki/Dewey_Decimal_Classification).
36
36
The need for an index becomes critical as the size of data grows - just imagine the time it would take to find a specific novel amongst a million uncategorized books!
37
37
38
-
The same efficiencies arise in computing. Consider a simple 1D dataset consisting of measurements `M=[10.0,20.0,30.0,40.0,50.0,60.0]` at six coordinate positions `X=[1, 2, 4, 8, 16, 32]`. _What was our measurement at `X=8`?_
39
-
To answer this in code, we could either do a brute-force linear search (or binary search if sorted) through the coordinates array, or we could build a more efficient data structure designed for fast searches --- an Index. A common convenient index is a _key:value_ mapping or "hash table" between the coordinate values and their integer positions `i=[0,1,2,3,4,5]`. Finally, we are able to identify the index for our coordinate of interest (`X[3]=8`) and use it to lookup our measurement value `M[3]=40.0`.
38
+
The same efficiencies arise in computing. Consider a simple 1D dataset consisting of measurement values `M=[10.0,20.0,30.0,40.0,50.0,60.0]` at six coordinate positions `X=[1, 2, 4, 8, 16, 32]`. _What was our measurement at `X = 8`?_
39
+
To answer this in code, we could either do a brute-force linear search (or binary search if sorted) through the coordinates array, or we could build a more efficient data structure designed for fast searches --- an Index. A common convenient index is a _key:value_ mapping or "hash table" between the coordinate values and their integer positions `i=[0, 1, 2, 3, 4, 5]`. Once we identify the _index_`i=3`for our coordinate of interest (`X[3] = 8`) we use it to lookup our measurement value `M[3] = 40.0`.
40
40
41
-
> 💡 **Note:** Index structures present a trade-off: they are a little slow to construct but much faster at lookups than brute-force searches.
41
+
> 💡 **Note:** Index structures present a trade-off: they are a little slow to construct and have a memory footprint, but are much faster at lookups than brute-force searches.
42
42
43
43
## pandas.Index
44
44
45
-
Xarray's [label-based selection](https://docs.xarray.dev/en/latest/user-guide/indexing.html) allows a more expressive and simple syntax in which you don't have to think about the index (`da.sel(x=8)`). Up until now, Xarray has relied exclusively on [pandas.Index](https://pandas.pydata.org/docs/user_guide/indexing.html), which is still used by default:
45
+
Xarray's [label-based selection](https://docs.xarray.dev/en/latest/user-guide/indexing.html) allows a more expressive and simple syntax in which you don't have to think about the index: `da.sel(x=8)`. To accomplish this, Xarray has historically relied on [pandas.Index](https://pandas.pydata.org/docs/user_guide/indexing.html) behind the scenes, which is still used by default:
46
46
47
47
```python
48
48
x = np.array([1, 2, 4, 8, 16, 32])
49
-
y= np.array([10, 20, 30, 40, 50, 60])
50
-
da = xr.DataArray(y, coords={'x': x})
49
+
m= np.arange(10, 70, 10.0)
50
+
da = xr.DataArray(m, coords={'X': x}, name='M')
51
51
da
52
52
```
53
53
@@ -60,11 +60,11 @@ da.sel(x=8)
60
60
61
61
## Alternatives to pandas.Index
62
62
63
-
There are many different indexing schemes and ways to generate an index. pandas.Index's approach is roughly similar to running a loop over all coordinate values and creating an _index:coordinate_ mapping, optionally identifying duplicates and sorting along the way. But, you might recognize that our example coordinates above can in fact be represented by a function `X(i)=2**i` where `i` is the integer position! Given that function we can quickly get measurement values at any coordinate: `Y(X=8)` = `Y[log2(8)]` = `Y[3]=40`. Xarray now has a [CoordinateTransformIndex](https://xarray-indexes.readthedocs.io/blocks/transform.html) to handle this type of on-demand calculation of coordinates!
63
+
There are many different indexing schemes and ways to generate an index. pandas.Index's approach is roughly similar to running a loop over all coordinate values to create an _index:coordinate_ mapping, optionally identifying duplicates and sorting along the way. But, you might recognize that our example coordinates above can in fact be represented by a function `X(i) = 2**i` where `i` is the integer position! Given that function we can quickly get measurement values at any coordinate: `M(X=8)` = `M[log2(8)]` = `M[3] = 40`. Xarray now has a [CoordinateTransformIndex](https://xarray-indexes.readthedocs.io/blocks/transform.html) to handle this type of on-demand calculation of coordinates!
64
64
65
-
### xarrayRangeIndex
65
+
### xarray.RangeIndex
66
66
67
-
A simple special case of `CoordinateTransformIndex` is a `RangeIndex` where coordinates can be defined by a start, stop, and uniform step size. _`pandas.RangeIndex` only supports integers_, whereas Xarray handles floating-point values. Coordinate look-up is performed on-the-fly rather than loading all values into memory up-front when creating a Dataset, which is critical for the example below that has a coordinate array of 7 terabytes!
67
+
A simple special case of `CoordinateTransformIndex` is a `RangeIndex` where coordinates can be defined by a start, stop, and uniform step size. `pandas.RangeIndex` only supports integers, whereas Xarray handles floating-point values. Coordinate look-up is performed on-the-fly rather than loading all values into memory up-front when creating a Dataset, which is critical for the example below that has a coordinate array of 7 terabytes!
68
68
69
69
```python
70
70
from xarray.indexes import RangeIndex
@@ -89,7 +89,7 @@ sliced.x
89
89
90
90
In addition to a few new built-in indexes, `xarray.Index` provides an API that allows dealing with coordinate data and metadata in a highly customizable way for the most common Xarray operations such as `sel`, `align`, `concat`, `stack`. This is a powerful extension mechanism that is very important for supporting a multitude of domain-specific data structures. Here are a few examples below.
91
91
92
-
### rasterixRasterIndex
92
+
### rasterix.RasterIndex
93
93
94
94
Earlier we mentioned that coordinates may have a _functional representation_.
95
95
For 2D raster images, this function often takes the form of an [Affine Transform](https://en.wikipedia.org/wiki/Affine_transformation).
> "real-world datasets are usually more than just raw numbers; they have labels which encode information about how the array values map to locations in space, time, etc." -- [Xarray Documentation](https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html#what-labels-enable)
140
140
141
141
We often think about metadata providing context for _measurement values_ but metadata is also critical for coordinates!
142
-
In particular, to align two different datasets we must ask if the coordinates are in the same coordinate system.
142
+
In particular, to align two different datasets we must ask if the coordinates are in the same coordinate system?
143
143
144
144
There are currently over 7000 commonly used [Coordinate Reference Systems (CRS)](https://spatialreference.org/ref/epsg/) for geospatial data in the authoritative EPSG database!
145
145
And of course an infinite number of custom-defined CRSs.
@@ -158,7 +158,7 @@ ds1 + ds2
158
158
MergeError: conflicting values/indexes on objects to be combined for coordinate 'crs'
159
159
```
160
160
161
-
### XVec GeometryIndex
161
+
### xvec.GeometryIndex
162
162
163
163
A "vector data cube" is an n-D array that has at least one dimension indexed by an array of vector geometries.
164
164
With the `xvec.GeometryIndex`, Xarray objects gain functionality equivalent to geopandas' GeoDataFrames!
@@ -208,9 +208,11 @@ Be sure to check out the [Gallery of Custom Index Examples](https://xarray-index
208
208
209
209
## What's next?
210
210
211
-
While we're extremely excited about what can _already_ be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work.
211
+
While we're extremely excited about what can _already_ be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work. In a follow-up blog post, we will also illustrate how Xarray's internals interact with the xarray.Index API and how it can be leveraged in order to customize the behavior of some of the most common Xarray operations like indexing and alignment.
212
+
213
+
We believe the new flexible indexing machinery will increase usage of Xarray across scientific domains and are actively working on examples that hopefully will appeal to [astronomers](https://xarray-indexes.readthedocs.io/blocks/transform.html#example-astronomy) and [biologists](https://xarray.dev/blog/xarray-biology)!
212
214
213
-
Have an idea for your own custom index? Check out [this section of the Xarray documentation](https://docs.xarray.dev/en/stable/internals/how-to-create-custom-index.html). In a follow-up blog post, we will also illustrate how Xarray's internals interact with the xarray.Index API and how it can be leveraged in order to customize the behavior of some of the most common Xarray operations like indexing and alignment.
215
+
Have an idea for your own custom index? Check out [this section of the Xarray documentation](https://docs.xarray.dev/en/stable/internals/how-to-create-custom-index.html) and please advertise what you're working on in our [gallery of examples](https://github.com/xarray-contrib/xarray-indexes).
0 commit comments