Internal refactor of XArray, with a new CoordXArray subtype #54

shoyer · 2014-03-07T22:42:35Z

This allows us to simplify our internal model for XArray (it always cached
internally as a base ndarray) and supports some previously tricky aspects
involving pandas.Index objects. Noteably:

The dtype of arrays stored as pandas.Index objects can now be faithfully
saved and restored. Doing math with XArray objects always yields objects
with the right dtype, so ds['latitude'] + 1 has dtype=float, not
dtype=object.
It's no longer necessary to load index data into memory upon creating a new
Dataset. Instead, the index data can be loaded on demand.
var.data is always an ndarray. var.index is always a pandas.Index.

Related issues: #17, #39, #40.

This allows us to simplify our internal model for XArray (it always cached internally as a base ndarray) and supports some previously tricky aspects involving pandas.Index objects. Noteably: 1. The dtype of arrays stored as pandas.Index objects can now be faithfully saved and restored. Doing math with XArray objects always yields objects with the right dtype, so `ds['latitude'] + 1` has dtype=float, not dtype=object. 2. It's no longer necessary to load index data into memory upon creating a new Dataset. Instead, the index data can be loaded on demand. 3. `var.data` is always an ndarray. `var.index` is always a pandas.Index. Related issues: #17, #39, #40.

Pandas seems to have trouble constructing multi-indices when it's given datetime64 arrays which don't have ns precision. The current version of decode_cf_datetime will give datetime arrays with the default precision, which is us. Hence, when coupled with the dtype restoring wrapper from PR #54, the `to_series()` and `to_dataframe()` methods were broken when using decoded datetimes.

shoyer · 2014-03-10T18:04:39Z

I've been using this branch locally and squashing bugs (see the new commits)... I think it's pretty close but I would appreciate some feedback when you get the chance.

Note: this was already a bug, but it's more cleanly fixed with this patch.

shoyer · 2014-03-11T00:00:17Z

note: it turns out the latest fixes I applied were actually general bugs in XArray, not CoordXArray issues.

akleeman · 2014-03-11T01:01:25Z

src/xray/utils.py

@@ -201,11 +203,12 @@ def encode_cf_datetime(dates, units=None, calendar=None):
            and np.issubdtype(dates.dtype, np.datetime64)):
        # for now, don't bother doing any trickery like decode_cf_datetime to
        # convert dates to numbers faster
-        dates = dates.astype(datetime)
+        # TODO: don't use pandas.DatetimeIndex to do the conversion


Agree, thats a bit awkward ... but it actually fixes a bug that cropped up when using xray in slocum!

#59 (which I just rebased) switches this conversion back to not using pandas.

Internal refactor of XArray, with a new CoordXArray subtype

Pandas seems to have trouble constructing multi-indices when it's given datetime64 arrays which don't have ns precision. The current version of decode_cf_datetime will give datetime arrays with the default precision, which is us. Hence, when coupled with the dtype restoring wrapper from PR #54, the `to_series()` and `to_dataframe()` methods were broken when using decoded datetimes.

* think I've fixed the bug * used feature from python 3.9 * test but doesn't yet work properly * only check subtree, not down to root * make sure choice whether to check from root is propagated * bump python version in CI * 3.10 instead of 3.1

shoyer added 4 commits March 7, 2014 14:37

Fix copy for CoordXArray

d3c466f

Restore handling of dtype=object arrays in encode_cf_variable

2775ace

Test XArray properties

f2c695b

shoyer mentioned this pull request Mar 8, 2014

Allow datetime.timedelta coordinates. #55

Closed

shoyer mentioned this pull request Mar 10, 2014

Ensure decoding as datetime64[ns] #59

Merged

shoyer added 2 commits March 10, 2014 10:42

Fix datetimeindex components (with new test)

137d4d8

Further fixes to datetimeindex components

dcf434f

shoyer added 4 commits March 10, 2014 13:44

Fix coordinate math with DatasetArrays

5aa6dee

Note: this was already a bug, but it's more cleanly fixed with this patch.

Fix indexing_mode for CoordXArray

93dbbb0

Another fixing to indexing_mode for CoordXArray

adc1f5b

More indexing fixes

8ea1370

akleeman reviewed Mar 11, 2014
View reviewed changes

akleeman added a commit that referenced this pull request Mar 11, 2014

Merge pull request #54 from akleeman/index-types

74d43ff

Internal refactor of XArray, with a new CoordXArray subtype

akleeman merged commit 74d43ff into master Mar 11, 2014

shoyer deleted the index-types branch March 19, 2014 01:13

shoyer mentioned this pull request Mar 24, 2014

OpenDAP loaded Dataset has lon/lats with type 'object'. #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Internal refactor of XArray, with a new CoordXArray subtype #54

Internal refactor of XArray, with a new CoordXArray subtype #54

Uh oh!

shoyer commented Mar 7, 2014

Uh oh!

shoyer commented Mar 10, 2014

Uh oh!

shoyer commented Mar 11, 2014

Uh oh!

akleeman Mar 11, 2014

Uh oh!

shoyer Mar 11, 2014

Uh oh!

Uh oh!

Uh oh!

Internal refactor of XArray, with a new CoordXArray subtype #54

Internal refactor of XArray, with a new CoordXArray subtype #54

Uh oh!

Conversation

shoyer commented Mar 7, 2014

Uh oh!

shoyer commented Mar 10, 2014

Uh oh!

shoyer commented Mar 11, 2014

Uh oh!

akleeman Mar 11, 2014

Choose a reason for hiding this comment

Uh oh!

shoyer Mar 11, 2014

Choose a reason for hiding this comment

Uh oh!

Uh oh!