Skip to content

Internal refactor of XArray, with a new CoordXArray subtype #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 11, 2014

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Mar 7, 2014

This allows us to simplify our internal model for XArray (it always cached
internally as a base ndarray) and supports some previously tricky aspects
involving pandas.Index objects. Noteably:

  1. The dtype of arrays stored as pandas.Index objects can now be faithfully
    saved and restored. Doing math with XArray objects always yields objects
    with the right dtype, so ds['latitude'] + 1 has dtype=float, not
    dtype=object.
  2. It's no longer necessary to load index data into memory upon creating a new
    Dataset. Instead, the index data can be loaded on demand.
  3. var.data is always an ndarray. var.index is always a pandas.Index.

Related issues: #17, #39, #40.

shoyer added 4 commits March 7, 2014 14:37
This allows us to simplify our internal model for XArray (it always cached
internally as a base ndarray) and supports some previously tricky aspects
involving pandas.Index objects. Noteably:
1. The dtype of arrays stored as pandas.Index objects can now be faithfully
   saved and restored. Doing math with XArray objects always yields objects
   with the right dtype, so `ds['latitude'] + 1` has dtype=float, not
   dtype=object.
2. It's no longer necessary to load index data into memory upon creating a new
   Dataset. Instead, the index data can be loaded on demand.
3. `var.data` is always an ndarray. `var.index` is always a pandas.Index.

Related issues: #17, #39, #40.
shoyer added a commit that referenced this pull request Mar 10, 2014
Pandas seems to have trouble constructing multi-indices when it's given
datetime64 arrays which don't have ns precision. The current version of
decode_cf_datetime will give datetime arrays with the default precision, which
is us. Hence, when coupled with the dtype restoring wrapper from PR #54, the
`to_series()` and `to_dataframe()` methods were broken when using decoded
datetimes.
@shoyer
Copy link
Member Author

shoyer commented Mar 10, 2014

I've been using this branch locally and squashing bugs (see the new commits)... I think it's pretty close but I would appreciate some feedback when you get the chance.

@shoyer
Copy link
Member Author

shoyer commented Mar 11, 2014

note: it turns out the latest fixes I applied were actually general bugs in XArray, not CoordXArray issues.

@@ -201,11 +203,12 @@ def encode_cf_datetime(dates, units=None, calendar=None):
and np.issubdtype(dates.dtype, np.datetime64)):
# for now, don't bother doing any trickery like decode_cf_datetime to
# convert dates to numbers faster
dates = dates.astype(datetime)
# TODO: don't use pandas.DatetimeIndex to do the conversion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, thats a bit awkward ... but it actually fixes a bug that cropped up when using xray in slocum!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#59 (which I just rebased) switches this conversion back to not using pandas.

akleeman added a commit that referenced this pull request Mar 11, 2014
Internal refactor of XArray, with a new CoordXArray subtype
@akleeman akleeman merged commit 74d43ff into master Mar 11, 2014
shoyer added a commit that referenced this pull request Mar 11, 2014
Pandas seems to have trouble constructing multi-indices when it's given
datetime64 arrays which don't have ns precision. The current version of
decode_cf_datetime will give datetime arrays with the default precision, which
is us. Hence, when coupled with the dtype restoring wrapper from PR #54, the
`to_series()` and `to_dataframe()` methods were broken when using decoded
datetimes.
@shoyer shoyer deleted the index-types branch March 19, 2014 01:13
keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024
* think I've fixed the bug

* used feature from python 3.9

* test but doesn't yet work properly

* only check subtree, not down to root

* make sure choice whether to check from root is propagated

* bump python version in CI

* 3.10 instead of 3.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants