-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Internal refactor of XArray, with a new CoordXArray subtype #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows us to simplify our internal model for XArray (it always cached internally as a base ndarray) and supports some previously tricky aspects involving pandas.Index objects. Noteably: 1. The dtype of arrays stored as pandas.Index objects can now be faithfully saved and restored. Doing math with XArray objects always yields objects with the right dtype, so `ds['latitude'] + 1` has dtype=float, not dtype=object. 2. It's no longer necessary to load index data into memory upon creating a new Dataset. Instead, the index data can be loaded on demand. 3. `var.data` is always an ndarray. `var.index` is always a pandas.Index. Related issues: #17, #39, #40.
Pandas seems to have trouble constructing multi-indices when it's given datetime64 arrays which don't have ns precision. The current version of decode_cf_datetime will give datetime arrays with the default precision, which is us. Hence, when coupled with the dtype restoring wrapper from PR #54, the `to_series()` and `to_dataframe()` methods were broken when using decoded datetimes.
I've been using this branch locally and squashing bugs (see the new commits)... I think it's pretty close but I would appreciate some feedback when you get the chance. |
Note: this was already a bug, but it's more cleanly fixed with this patch.
note: it turns out the latest fixes I applied were actually general bugs in XArray, not CoordXArray issues. |
@@ -201,11 +203,12 @@ def encode_cf_datetime(dates, units=None, calendar=None): | |||
and np.issubdtype(dates.dtype, np.datetime64)): | |||
# for now, don't bother doing any trickery like decode_cf_datetime to | |||
# convert dates to numbers faster | |||
dates = dates.astype(datetime) | |||
# TODO: don't use pandas.DatetimeIndex to do the conversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, thats a bit awkward ... but it actually fixes a bug that cropped up when using xray in slocum!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#59 (which I just rebased) switches this conversion back to not using pandas.
Internal refactor of XArray, with a new CoordXArray subtype
Pandas seems to have trouble constructing multi-indices when it's given datetime64 arrays which don't have ns precision. The current version of decode_cf_datetime will give datetime arrays with the default precision, which is us. Hence, when coupled with the dtype restoring wrapper from PR #54, the `to_series()` and `to_dataframe()` methods were broken when using decoded datetimes.
* think I've fixed the bug * used feature from python 3.9 * test but doesn't yet work properly * only check subtree, not down to root * make sure choice whether to check from root is propagated * bump python version in CI * 3.10 instead of 3.1
This allows us to simplify our internal model for XArray (it always cached
internally as a base ndarray) and supports some previously tricky aspects
involving pandas.Index objects. Noteably:
saved and restored. Doing math with XArray objects always yields objects
with the right dtype, so
ds['latitude'] + 1
has dtype=float, notdtype=object.
Dataset. Instead, the index data can be loaded on demand.
var.data
is always an ndarray.var.index
is always a pandas.Index.Related issues: #17, #39, #40.