Skip to content

Commit 78f7491

Browse files
TomNicholasIllviljanmax-sixty
authored
Docs page on interoperability (#7992)
* add page on internal design * add xarray-datatree to intersphinx mapping * typo * add subheadings to the accessors page * Revert "add page on internal design" This reverts commit 198f67b. * rename page on variables * whatsnew * page on interoperability * add interoperability page to index * fix whatsnew * sel->isel * add section on lazy indexing * actually show lazy indexing example * link to custom indexes page * fix some formatting * put encoding last * attrs and encoding are not ordered dicts Co-authored-by: Illviljan <[email protected]> * reword lack of support for subclassing Co-authored-by: Maximilian Roos <[email protected]> * remove duplicate word * encourage contributions to supporting subclassing --------- Co-authored-by: Illviljan <[email protected]> Co-authored-by: Maximilian Roos <[email protected]>
1 parent cef76ec commit 78f7491

File tree

5 files changed

+55
-5
lines changed

5 files changed

+55
-5
lines changed

doc/internals/how-to-create-custom-index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
.. currentmodule:: xarray
22

3+
.. _internals.custom indexes:
4+
35
How to create a custom index
46
============================
57

doc/internals/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@ The pages in this section are intended for:
1919
:hidden:
2020

2121
internal-design
22+
interoperability
2223
duck-arrays-integration
2324
chunked-arrays
2425
extending-xarray
25-
zarr-encoding-spec
2626
how-to-add-new-backend
2727
how-to-create-custom-index
28+
zarr-encoding-spec

doc/internals/internal-design.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ which is used as the basic building block behind xarray's
5959
- ``data``: The N-dimensional array (typically a NumPy or Dask array) storing
6060
the Variable's data. It must have the same number of dimensions as the length
6161
of ``dims``.
62-
- ``attrs``: An ordered dictionary of metadata associated with this array. By
62+
- ``attrs``: A dictionary of metadata associated with this array. By
6363
convention, xarray's built-in operations never use this metadata.
64-
- ``encoding``: Another ordered dictionary used to store information about how
64+
- ``encoding``: Another dictionary used to store information about how
6565
these variable's data is represented on disk. See :ref:`io.encoding` for more
6666
details.
6767

@@ -95,7 +95,7 @@ all of which are implemented by forwarding on to the underlying ``Variable`` obj
9595

9696
In addition, a :py:class:`~xarray.DataArray` stores additional ``Variable`` objects stored in a dict under the private ``_coords`` attribute,
9797
each of which is referred to as a "Coordinate Variable". These coordinate variable objects are only allowed to have ``dims`` that are a subset of the data variable's ``dims``,
98-
and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.sizes` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
98+
and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.size` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
9999
The underlying data variable has this exact same size, and the attached coordinate variables have sizes which are some subset of the size of the data variable.
100100
Another way of saying this is that all coordinate variables must be "alignable" with the data variable.
101101

@@ -124,7 +124,7 @@ The :py:class:`~xarray.Dataset` class is a generalization of the :py:class:`~xar
124124
Internally all data variables and coordinate variables are stored under a single ``variables`` dict, and coordinates are
125125
specified by storing their names in a private ``_coord_names`` dict.
126126

127-
The dataset's dimensions are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
127+
The dataset's ``dims`` are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
128128
variables cannot have a dimension that is not present on any data variable.
129129

130130
When a data variable or coordinate variable is accessed, a new ``DataArray`` is again constructed from all compatible

doc/internals/interoperability.rst

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
.. _interoperability:
2+
3+
Interoperability of Xarray
4+
==========================
5+
6+
Xarray is designed to be extremely interoperable, in many orthogonal ways.
7+
Making xarray as flexible as possible is the common theme of most of the goals on our :ref:`roadmap`.
8+
9+
This interoperability comes via a set of flexible abstractions into which the user can plug in. The current full list is:
10+
11+
- :ref:`Custom file backends <add_a_backend>` via the :py:class:`~xarray.backends.BackendEntrypoint` system,
12+
- Numpy-like :ref:`"duck" array wrapping <internals.duckarrays>`, which supports the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_,
13+
- :ref:`Chunked distributed array computation <internals.chunkedarrays>` via the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` system,
14+
- Custom :py:class:`~xarray.Index` objects for :ref:`flexible label-based lookups <internals.custom indexes>`,
15+
- Extending xarray objects with domain-specific methods via :ref:`custom accessors <internals.accessors>`.
16+
17+
.. warning::
18+
19+
One obvious way in which xarray could be more flexible is that whilst subclassing xarray objects is possible, we
20+
currently don't support it in most transformations, instead recommending composition over inheritance. See the
21+
:ref:`internal design page <internal design.subclassing>` for the rationale and look at the corresponding `GH issue <https://github.com/pydata/xarray/issues/3980>`_
22+
if you're interested in improving support for subclassing!
23+
24+
.. note::
25+
26+
If you think there is another way in which xarray could become more generically flexible then please
27+
tell us your ideas by `raising an issue to request the feature <https://github.com/pydata/xarray/issues/new/choose>`_!
28+
29+
30+
Whilst xarray was originally designed specifically to open ``netCDF4`` files as :py:class:`numpy.ndarray` objects labelled by :py:class:`pandas.Index` objects,
31+
it is entirely possible today to:
32+
33+
- lazily open an xarray object directly from a custom binary file format (e.g. using ``xarray.open_dataset(path, engine='my_custom_format')``,
34+
- handle the data as any API-compliant numpy-like array type (e.g. sparse or GPU-backed),
35+
- distribute out-of-core computation across that array type in parallel (e.g. via :ref:`dask`),
36+
- track the physical units of the data through computations (e.g via `pint-xarray <https://pint-xarray.readthedocs.io/en/stable/>`_),
37+
- query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure),
38+
- attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata),
39+
- organize hierarchical groups of xarray data in a :py:class:`~datatree.DataTree` (e.g. to treat heterogenous simulation and observational data together during analysis).
40+
41+
All of these features can be provided simultaneously, using libaries compatible with the rest of the scientific python ecosystem.
42+
In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and
43+
separation of concerns via various domain-agnostic abstractions.
44+
45+
Most of the remaining pages in the documentation of xarray's internals describe these various types of interoperability in more detail.

doc/whats-new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,8 @@ Bug fixes
137137
Documentation
138138
~~~~~~~~~~~~~
139139

140+
- Added page on the interoperability of xarray objects.
141+
(:pull:`7992`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
140142
- Added xarray-regrid to the list of xarray related projects (:pull:`8272`).
141143
By `Bart Schilperoort <https://github.com/BSchilperoort>`_.
142144

0 commit comments

Comments
 (0)