-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Enable use of cftime.datetime coordinates with differentiate and interp #2434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up pr.
I left a few minor comments, but it looks good for me.
xarray/core/missing.py
Outdated
# Scipy casts coordinates to np.float64, which is not accurate | ||
# enough for datetime64 (uses 64bit integer). | ||
# We assume that the most of the bits are used to represent the | ||
# offset (min(x)) and the variation (x - min(x)) can be | ||
# represented by float. | ||
xmin = np.min(x[i]) | ||
xmin = x[i].min() | ||
x[i] = to_numeric(x[i], offset=xmin, dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if x and xnew are different data type?
(E.g. date time vs float)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question -- it seems the following was indeed allowed before this PR:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import xarray as xr
In [4]: times = pd.date_range('2000', periods=5)
In [5]: da = xr.DataArray(np.arange(5), coords=[times], dims=['time'])
In [6]: da
Out[6]:
<xarray.DataArray (time: 5)>
array([0, 1, 2, 3, 4])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-05
In [7]: da.interp(time=0.5)
Out[7]:
<xarray.DataArray ()>
array(5.787037037037037e-15)
Coordinates:
time float64 0.5
where 0.5
is implicitly interpreted as nanoseconds since the first date in the original coordinate.
This was surprising to me at first (I'm used to thinking about float-representations of dates in the CF-context, where an array of floats representing time must be accompanied by a units attribute that explicitly defines how those floats are to be interpreted as dates). If this use case is important to preserve I can make sure that it continues to work (and I can add tests to make sure we don't break it later on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is actually error prone.
I originally tried to design this method to work as similar to sel
, so maybe we can learn from that method also for this edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to convert between int and float, but we should avoid automatically converting datetime units. That case should raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good -- I wasn't sure if I was missing something. I'll make sure this raises an informative TypeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 5377b16, the result from my example now looks like:
In [6]: da.interp(time=0.5)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-9dfef9ee2abb> in <module>()
----> 1 da.interp(time=0.5)
/Users/spencerclark/xarray-dev/xarray/xarray/core/dataarray.pyc in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs)
1021 ds = self._to_temp_dataset().interp(
1022 coords, method=method, kwargs=kwargs, assume_sorted=assume_sorted,
-> 1023 **coords_kwargs)
1024 return self._from_temp_dataset(ds)
1025
/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs)
2005 var_indexers = {k: _validate_interp_indexer(
2006 maybe_variable(obj, k), v) for k, v
-> 2007 in indexers.items() if k in var.dims}
2008 variables[name] = missing.interp(
2009 var, var_indexers, method, **kwargs)
/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in <dictcomp>((k, v))
2005 var_indexers = {k: _validate_interp_indexer(
2006 maybe_variable(obj, k), v) for k, v
-> 2007 in indexers.items() if k in var.dims}
2008 variables[name] = missing.interp(
2009 var, var_indexers, method, **kwargs)
/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in _validate_interp_indexer(x, new_x)
1995 'interpolate to must be either datetime '
1996 'strings or datetimes. '
-> 1997 'Instead got\n{}'.format(new_x))
1998 else:
1999 return (x, new_x)
TypeError: When interpolating over a datetime-like coordinate, the coordinates to interpolate to must be either datetime strings or datetimes. Instead got
<xarray.Variable ()>
array(0.5)
@fujiisoup does this look like the right place for this error message to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @spencerkclark. It looks very nice to me.
@@ -594,19 +594,24 @@ def __len__(self): | |||
|
|||
|
|||
def to_numeric(array, offset=None, datetime_unit=None, dtype=float): | |||
""" | |||
Make datetime array float | |||
"""Convert an array containing datetime-like data to an array of floats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this function do nothing to numeric arrays. Maybe the default offset should be zero and we can pass the minimum for the datetime array manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks to your earlier comment I see where you are coming from now (my initial thought was that in all cases we would make sure an array contained dates before calling to_numeric
on it). Depending on what we decide to do there I can modify this function as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just imagined if we use this function for different purpose in the future. I have not a particular idea yet but i thought we may not always want to subtract minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I renamed to_numeric
to datetime_to_numeric
to make it clear that it is intended only to be used with datetimes (I've tried to make sure now that this is always the case). Does that seem reasonable? I think we can always change the defaults to this internal utility function if we find a need later.
5469196
to
5377b16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me:)
* master: Replace the last of unittest with pytest (pydata#2467) Add python_requires to setup.py (pydata#2465) Clean up _parse_array_of_cftime_strings (pydata#2464) plot.contour: Don't make cmap if colors is a single color. (pydata#2453) np.AxisError was added in numpy 1.13 (pydata#2455) Add CFTimeIndex.shift (pydata#2431) Fix FutureWarning in CFTimeIndex.date_type (pydata#2448) fix:2445 (pydata#2446) Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434) restore ddof support in std (pydata#2447) Future warning for default reduction dimension of groupby (pydata#2366) Remove incorrect statement about "drop" in the text docs (pydata#2439) Use profile mechanism, not no-op mutation (pydata#2442) switch travis language to generic (pydata#2432)
* master: (51 commits) xarray.backends refactor (pydata#2261) Fix indexing error for data loaded with open_rasterio (pydata#2456) Properly support user-provided norm. (pydata#2443) pep8speaks (pydata#2462) isort (pydata#2469) tests shoudn't need to pass for a PR (pydata#2471) Replace the last of unittest with pytest (pydata#2467) Add python_requires to setup.py (pydata#2465) Update whats-new.rst (pydata#2466) Clean up _parse_array_of_cftime_strings (pydata#2464) plot.contour: Don't make cmap if colors is a single color. (pydata#2453) np.AxisError was added in numpy 1.13 (pydata#2455) Add CFTimeIndex.shift (pydata#2431) Fix FutureWarning in CFTimeIndex.date_type (pydata#2448) fix:2445 (pydata#2446) Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434) restore ddof support in std (pydata#2447) Future warning for default reduction dimension of groupby (pydata#2366) Remove incorrect statement about "drop" in the text docs (pydata#2439) Use profile mechanism, not no-op mutation (pydata#2442) ...
* master: (21 commits) xarray.backends refactor (pydata#2261) Fix indexing error for data loaded with open_rasterio (pydata#2456) Properly support user-provided norm. (pydata#2443) pep8speaks (pydata#2462) isort (pydata#2469) tests shoudn't need to pass for a PR (pydata#2471) Replace the last of unittest with pytest (pydata#2467) Add python_requires to setup.py (pydata#2465) Update whats-new.rst (pydata#2466) Clean up _parse_array_of_cftime_strings (pydata#2464) plot.contour: Don't make cmap if colors is a single color. (pydata#2453) np.AxisError was added in numpy 1.13 (pydata#2455) Add CFTimeIndex.shift (pydata#2431) Fix FutureWarning in CFTimeIndex.date_type (pydata#2448) fix:2445 (pydata#2446) Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434) restore ddof support in std (pydata#2447) Future warning for default reduction dimension of groupby (pydata#2366) Remove incorrect statement about "drop" in the text docs (pydata#2439) Use profile mechanism, not no-op mutation (pydata#2442) ...
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)As discussed in #2398 (review), this enables the use of
differentiate
andinterp
on DataArrays/Datasets withcftime.datetime
coordinates.