Skip to content

Enable use of cftime.datetime coordinates with differentiate and interp #2434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 28, 2018

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Sep 23, 2018

  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

As discussed in #2398 (review), this enables the use of differentiate and interp on DataArrays/Datasets with cftime.datetime coordinates.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up pr.
I left a few minor comments, but it looks good for me.

# Scipy casts coordinates to np.float64, which is not accurate
# enough for datetime64 (uses 64bit integer).
# We assume that the most of the bits are used to represent the
# offset (min(x)) and the variation (x - min(x)) can be
# represented by float.
xmin = np.min(x[i])
xmin = x[i].min()
x[i] = to_numeric(x[i], offset=xmin, dtype=np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if x and xnew are different data type?
(E.g. date time vs float)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- it seems the following was indeed allowed before this PR:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xarray as xr

In [4]: times = pd.date_range('2000', periods=5)

In [5]: da = xr.DataArray(np.arange(5), coords=[times], dims=['time'])

In [6]: da
Out[6]:
<xarray.DataArray (time: 5)>
array([0, 1, 2, 3, 4])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-05

In [7]: da.interp(time=0.5)
Out[7]:
<xarray.DataArray ()>
array(5.787037037037037e-15)
Coordinates:
    time     float64 0.5

where 0.5 is implicitly interpreted as nanoseconds since the first date in the original coordinate.

This was surprising to me at first (I'm used to thinking about float-representations of dates in the CF-context, where an array of floats representing time must be accompanied by a units attribute that explicitly defines how those floats are to be interpreted as dates). If this use case is important to preserve I can make sure that it continues to work (and I can add tests to make sure we don't break it later on).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is actually error prone.
I originally tried to design this method to work as similar to sel, so maybe we can learn from that method also for this edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to convert between int and float, but we should avoid automatically converting datetime units. That case should raise an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- I wasn't sure if I was missing something. I'll make sure this raises an informative TypeError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 5377b16, the result from my example now looks like:

In [6]: da.interp(time=0.5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9dfef9ee2abb> in <module>()
----> 1 da.interp(time=0.5)

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataarray.pyc in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs)
   1021         ds = self._to_temp_dataset().interp(
   1022             coords, method=method, kwargs=kwargs, assume_sorted=assume_sorted,
-> 1023             **coords_kwargs)
   1024         return self._from_temp_dataset(ds)
   1025

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs)
   2005                     var_indexers = {k: _validate_interp_indexer(
   2006                         maybe_variable(obj, k), v) for k, v
-> 2007                                     in indexers.items() if k in var.dims}
   2008                     variables[name] = missing.interp(
   2009                         var, var_indexers, method, **kwargs)

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in <dictcomp>((k, v))
   2005                     var_indexers = {k: _validate_interp_indexer(
   2006                         maybe_variable(obj, k), v) for k, v
-> 2007                                     in indexers.items() if k in var.dims}
   2008                     variables[name] = missing.interp(
   2009                         var, var_indexers, method, **kwargs)

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in _validate_interp_indexer(x, new_x)
   1995                                'interpolate to must be either datetime '
   1996                                'strings or datetimes. '
-> 1997                                'Instead got\n{}'.format(new_x))
   1998             else:
   1999                 return (x, new_x)

TypeError: When interpolating over a datetime-like coordinate, the coordinates to interpolate to must be either datetime strings or datetimes. Instead got
<xarray.Variable ()>
array(0.5)

@fujiisoup does this look like the right place for this error message to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @spencerkclark. It looks very nice to me.

@@ -594,19 +594,24 @@ def __len__(self):


def to_numeric(array, offset=None, datetime_unit=None, dtype=float):
"""
Make datetime array float
"""Convert an array containing datetime-like data to an array of floats.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this function do nothing to numeric arrays. Maybe the default offset should be zero and we can pass the minimum for the datetime array manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks to your earlier comment I see where you are coming from now (my initial thought was that in all cases we would make sure an array contained dates before calling to_numeric on it). Depending on what we decide to do there I can modify this function as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just imagined if we use this function for different purpose in the future. I have not a particular idea yet but i thought we may not always want to subtract minimum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I renamed to_numeric to datetime_to_numeric to make it clear that it is intended only to be used with datetimes (I've tried to make sure now that this is always the case). Does that seem reasonable? I think we can always change the defaults to this internal utility function if we find a need later.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me:)

@fujiisoup fujiisoup merged commit c2b09d6 into pydata:master Sep 28, 2018
@spencerkclark spencerkclark deleted the cftime-to-numeric branch September 28, 2018 13:45
@spencerkclark spencerkclark mentioned this pull request Oct 2, 2018
4 tasks
dcherian pushed a commit to dcherian/xarray that referenced this pull request Oct 7, 2018
* master:
  Replace the last of unittest with pytest (pydata#2467)
  Add python_requires to setup.py (pydata#2465)
  Clean up _parse_array_of_cftime_strings (pydata#2464)
  plot.contour: Don't make cmap if colors is a single color. (pydata#2453)
  np.AxisError was added in numpy 1.13 (pydata#2455)
  Add CFTimeIndex.shift (pydata#2431)
  Fix FutureWarning in CFTimeIndex.date_type (pydata#2448)
  fix:2445 (pydata#2446)
  Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434)
  restore ddof support in std (pydata#2447)
  Future warning for default reduction dimension of groupby (pydata#2366)
  Remove incorrect statement about "drop" in the text docs (pydata#2439)
  Use profile mechanism, not no-op mutation (pydata#2442)
  switch travis language to generic (pydata#2432)
dcherian pushed a commit to maahn/xarray that referenced this pull request Oct 10, 2018
* master: (51 commits)
  xarray.backends refactor (pydata#2261)
  Fix indexing error for data loaded with open_rasterio (pydata#2456)
  Properly support user-provided norm. (pydata#2443)
  pep8speaks (pydata#2462)
  isort (pydata#2469)
  tests shoudn't need to pass for a PR (pydata#2471)
  Replace the last of unittest with pytest (pydata#2467)
  Add python_requires to setup.py (pydata#2465)
  Update whats-new.rst (pydata#2466)
  Clean up _parse_array_of_cftime_strings (pydata#2464)
  plot.contour: Don't make cmap if colors is a single color. (pydata#2453)
  np.AxisError was added in numpy 1.13 (pydata#2455)
  Add CFTimeIndex.shift (pydata#2431)
  Fix FutureWarning in CFTimeIndex.date_type (pydata#2448)
  fix:2445 (pydata#2446)
  Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434)
  restore ddof support in std (pydata#2447)
  Future warning for default reduction dimension of groupby (pydata#2366)
  Remove incorrect statement about "drop" in the text docs (pydata#2439)
  Use profile mechanism, not no-op mutation (pydata#2442)
  ...
dcherian pushed a commit to dcherian/xarray that referenced this pull request Oct 10, 2018
* master: (21 commits)
  xarray.backends refactor (pydata#2261)
  Fix indexing error for data loaded with open_rasterio (pydata#2456)
  Properly support user-provided norm. (pydata#2443)
  pep8speaks (pydata#2462)
  isort (pydata#2469)
  tests shoudn't need to pass for a PR (pydata#2471)
  Replace the last of unittest with pytest (pydata#2467)
  Add python_requires to setup.py (pydata#2465)
  Update whats-new.rst (pydata#2466)
  Clean up _parse_array_of_cftime_strings (pydata#2464)
  plot.contour: Don't make cmap if colors is a single color. (pydata#2453)
  np.AxisError was added in numpy 1.13 (pydata#2455)
  Add CFTimeIndex.shift (pydata#2431)
  Fix FutureWarning in CFTimeIndex.date_type (pydata#2448)
  fix:2445 (pydata#2446)
  Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434)
  restore ddof support in std (pydata#2447)
  Future warning for default reduction dimension of groupby (pydata#2366)
  Remove incorrect statement about "drop" in the text docs (pydata#2439)
  Use profile mechanism, not no-op mutation (pydata#2442)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants