Skip to content

Enable use of cftime.datetime coordinates with differentiate and interp #2434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 28, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/interpolation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ by specifing the time periods required.

da_dt64.interp(time=pd.date_range('1/1/2000', '1/3/2000', periods=3))

Interpolation of data indexed by a :py:class:`~xarray.CFTimeIndex` is also
allowed. See :ref:`CFTimeIndex` for examples.

.. note::

Currently, our interpolation only works for regular grids.
Expand Down
56 changes: 37 additions & 19 deletions doc/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,9 @@ You can manual decode arrays in this form by passing a dataset to
One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of ``cftime.datetime`` objects and a ``CFTimeIndex``
can be used for indexing. The ``CFTimeIndex`` enables only a subset of
the indexing functionality of a ``pandas.DatetimeIndex`` and is only enabled
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
can be used for indexing. The :py:class:`~xarray.CFTimeIndex` enables only a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only enabled
when using the standalone version of ``cftime`` (not the version packaged with
earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more information.

Expand Down Expand Up @@ -219,20 +219,20 @@ Non-standard calendars and dates outside the Timestamp-valid range
------------------------------------------------------------------

Through the standalone ``cftime`` library and a custom subclass of
``pandas.Index``, xarray supports a subset of the indexing functionality enabled
through the standard ``pandas.DatetimeIndex`` for dates from non-standard
calendars or dates using a standard calendar, but outside the
`Timestamp-valid range`_ (approximately between years 1678 and 2262). This
behavior has not yet been turned on by default; to take advantage of this
functionality, you must have the ``enable_cftimeindex`` option set to
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars or dates using a standard calendar, but
outside the `Timestamp-valid range`_ (approximately between years 1678 and
2262). This behavior has not yet been turned on by default; to take advantage
of this functionality, you must have the ``enable_cftimeindex`` option set to
``True`` within your context (see :py:func:`~xarray.set_options` for more
information). It is expected that this will become the default behavior in
xarray version 0.11.

For instance, you can create a DataArray indexed by a time
coordinate with a no-leap calendar within a context manager setting the
``enable_cftimeindex`` option, and the time index will be cast to a
``CFTimeIndex``:
:py:class:`~xarray.CFTimeIndex`:

.. ipython:: python

Expand All @@ -247,28 +247,28 @@ coordinate with a no-leap calendar within a context manager setting the

.. note::

With the ``enable_cftimeindex`` option activated, a ``CFTimeIndex``
With the ``enable_cftimeindex`` option activated, a :py:class:`~xarray.CFTimeIndex`
will be used for time indexing if any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range

Otherwise a ``pandas.DatetimeIndex`` will be used. In addition, if any
Otherwise a :py:class:`pandas.DatetimeIndex` will be used. In addition, if any
variable (not just an index variable) is encoded using a non-standard
calendar, its times will be decoded into ``cftime.datetime`` objects,
calendar, its times will be decoded into :py:class:`cftime.datetime` objects,
regardless of whether or not they can be represented using
``np.datetime64[ns]`` objects.

xarray also includes a :py:func:`cftime_range` function, which enables creating a
``CFTimeIndex`` with regularly-spaced dates. For instance, we can create the
same dates and DataArray we created above using:
xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For instance, we can
create the same dates and DataArray we created above using:

.. ipython:: python

dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

For data indexed by a ``CFTimeIndex`` xarray currently supports:
For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

- `Partial datetime string indexing`_ using strictly `ISO 8601-format`_ partial
datetime strings:
Expand All @@ -294,7 +294,25 @@ For data indexed by a ``CFTimeIndex`` xarray currently supports:
.. ipython:: python

da.groupby('time.month').sum()


- Interpolation using :py:class:`cftime.datetime` objects:

.. ipython:: python

da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])

- Interpolation using datetime strings:

.. ipython:: python

da.interp(time=['0001-01-15', '0001-02-15'])

- Differentiation:

.. ipython:: python

da.differentiate('time')

- And serialization:

.. ipython:: python
Expand All @@ -305,7 +323,7 @@ For data indexed by a ``CFTimeIndex`` xarray currently supports:
.. note::

Currently resampling along the time dimension for data indexed by a
``CFTimeIndex`` is not supported.
:py:class:`~xarray.CFTimeIndex` is not supported.

.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601
Expand Down
7 changes: 7 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ Enhancements
- Added support for Python 3.7. (:issue:`2271`).
By `Joe Hamman <https://github.com/jhamman>`_.

- Added support for using ``cftime.datetime`` coordinates with
:py:meth:`~xarray.DataArray.differentiate`,
:py:meth:`~xarray.Dataset.differentiate`,
:py:meth:`~xarray.DataArray.interp`, and
:py:meth:`~xarray.Dataset.interp`.
By `Spencer Clark <https://github.com/spencerkclark>`_

Bug fixes
~~~~~~~~~

Expand Down
29 changes: 29 additions & 0 deletions xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,3 +314,32 @@ def __contains__(self, key):
def contains(self, key):
"""Needed for .loc based partial-string indexing"""
return self.__contains__(key)


def _parse_iso8601_without_reso(date_type, datetime_str):
date, _ = _parse_iso8601_with_reso(date_type, datetime_str)
return date


def _parse_array_of_cftime_strings(strings, date_type):
"""Create a numpy array from an array of strings.

For use in generating dates from strings for use with interp. Assumes the
array is either 0-dimensional or 1-dimensional.

Parameters
----------
strings : array of strings
Strings to convert to dates
date_type : cftime.datetime type
Calendar type to use for dates

Returns
-------
np.array
"""
if strings.ndim == 0:
return np.array(_parse_iso8601_without_reso(date_type, strings.item()))
else:
return np.array([_parse_iso8601_without_reso(date_type, s)
for s in strings])
51 changes: 36 additions & 15 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,11 @@
OrderedDict, basestring, dask_array_type, integer_types, iteritems, range)
from .utils import (
Frozen, SortedKeysDict, either_dict_or_kwargs, decode_numpy_dict_values,
ensure_us_time_resolution, hashable, maybe_wrap_array, to_numeric)
ensure_us_time_resolution, hashable, maybe_wrap_array, datetime_to_numeric)
from .variable import IndexVariable, Variable, as_variable, broadcast_variables

from ..coding.cftimeindex import _parse_array_of_cftime_strings

# list of attributes of pd.DatetimeIndex that are ndarrays of time info
_DATETIMEINDEX_COMPONENTS = ['year', 'month', 'day', 'hour', 'minute',
'second', 'microsecond', 'nanosecond', 'date',
Expand Down Expand Up @@ -1412,8 +1414,8 @@ def _validate_indexers(self, indexers):
""" Here we make sure
+ indexer has a valid keys
+ indexer is in a valid data type
* string indexers are cast to datetime64
if associated index is DatetimeIndex
+ string indexers are cast to the appropriate date type if the
associated index is a DatetimeIndex or CFTimeIndex
"""
from .dataarray import DataArray

Expand All @@ -1435,10 +1437,12 @@ def _validate_indexers(self, indexers):
else:
v = np.asarray(v)

if ((v.dtype.kind == 'U' or v.dtype.kind == 'S')
and isinstance(self.coords[k].to_index(),
pd.DatetimeIndex)):
v = v.astype('datetime64[ns]')
if v.dtype.kind == 'U' or v.dtype.kind == 'S':
index = self.indexes[k]
if isinstance(index, pd.DatetimeIndex):
v = v.astype('datetime64[ns]')
elif isinstance(index, xr.CFTimeIndex):
v = _parse_array_of_cftime_strings(v, index.date_type)

if v.ndim == 0:
v = as_variable(v)
Expand Down Expand Up @@ -1980,11 +1984,26 @@ def maybe_variable(obj, k):
except KeyError:
return as_variable((k, range(obj.dims[k])))

def _validate_interp_indexer(x, new_x):
# In the case of datetimes, the restrictions placed on indexers
# used with interp are stronger than those which are placed on
# isel, so we need an additional check after _validate_indexers.
if (_contains_datetime_like_objects(x) and
not _contains_datetime_like_objects(new_x)):
raise TypeError('When interpolating over a datetime-like '
'coordinate, the coordinates to '
'interpolate to must be either datetime '
'strings or datetimes. '
'Instead got\n{}'.format(new_x))
else:
return (x, new_x)

variables = OrderedDict()
for name, var in iteritems(obj._variables):
if name not in indexers:
if var.dtype.kind in 'uifc':
var_indexers = {k: (maybe_variable(obj, k), v) for k, v
var_indexers = {k: _validate_interp_indexer(
maybe_variable(obj, k), v) for k, v
in indexers.items() if k in var.dims}
variables[name] = missing.interp(
var, var_indexers, method, **kwargs)
Expand Down Expand Up @@ -3807,19 +3826,21 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
' dimensional'.format(coord, coord_var.ndim))

dim = coord_var.dims[0]
coord_data = coord_var.data
if coord_data.dtype.kind in 'mM':
if datetime_unit is None:
datetime_unit, _ = np.datetime_data(coord_data.dtype)
coord_data = to_numeric(coord_data, datetime_unit=datetime_unit)
if _contains_datetime_like_objects(coord_var):
if coord_var.dtype.kind in 'mM' and datetime_unit is None:
datetime_unit, _ = np.datetime_data(coord_var.dtype)
elif datetime_unit is None:
datetime_unit = 's' # Default to seconds for cftime objects
coord_var = datetime_to_numeric(coord_var, datetime_unit=datetime_unit)

variables = OrderedDict()
for k, v in self.variables.items():
if (k in self.data_vars and dim in v.dims and
k not in self.coords):
v = to_numeric(v, datetime_unit=datetime_unit)
if _contains_datetime_like_objects(v):
v = datetime_to_numeric(v, datetime_unit=datetime_unit)
grad = duck_array_ops.gradient(
v.data, coord_data, edge_order=edge_order,
v.data, coord_var, edge_order=edge_order,
axis=v.get_axis_num(dim))
variables[k] = Variable(v.dims, grad)
else:
Expand Down
12 changes: 7 additions & 5 deletions xarray/core/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@
import pandas as pd

from . import rolling
from .common import _contains_datetime_like_objects
from .computation import apply_ufunc
from .pycompat import iteritems
from .utils import is_scalar, OrderedSet, to_numeric
from .utils import is_scalar, OrderedSet, datetime_to_numeric
from .variable import Variable, broadcast_variables
from .duck_array_ops import dask_array_type

Expand Down Expand Up @@ -407,15 +408,16 @@ def _floatize_x(x, new_x):
x = list(x)
new_x = list(new_x)
for i in range(len(x)):
if x[i].dtype.kind in 'Mm':
if _contains_datetime_like_objects(x[i]):
# Scipy casts coordinates to np.float64, which is not accurate
# enough for datetime64 (uses 64bit integer).
# We assume that the most of the bits are used to represent the
# offset (min(x)) and the variation (x - min(x)) can be
# represented by float.
xmin = np.min(x[i])
x[i] = to_numeric(x[i], offset=xmin, dtype=np.float64)
new_x[i] = to_numeric(new_x[i], offset=xmin, dtype=np.float64)
xmin = x[i].min()
x[i] = datetime_to_numeric(x[i], offset=xmin, dtype=np.float64)
new_x[i] = datetime_to_numeric(
new_x[i], offset=xmin, dtype=np.float64)
return x, new_x


Expand Down
17 changes: 11 additions & 6 deletions xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -593,20 +593,25 @@ def __len__(self):
return len(self._data) - num_hidden


def to_numeric(array, offset=None, datetime_unit=None, dtype=float):
"""
Make datetime array float
def datetime_to_numeric(array, offset=None, datetime_unit=None, dtype=float):
"""Convert an array containing datetime-like data to an array of floats.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this function do nothing to numeric arrays. Maybe the default offset should be zero and we can pass the minimum for the datetime array manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks to your earlier comment I see where you are coming from now (my initial thought was that in all cases we would make sure an array contained dates before calling to_numeric on it). Depending on what we decide to do there I can modify this function as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just imagined if we use this function for different purpose in the future. I have not a particular idea yet but i thought we may not always want to subtract minimum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I renamed to_numeric to datetime_to_numeric to make it clear that it is intended only to be used with datetimes (I've tried to make sure now that this is always the case). Does that seem reasonable? I think we can always change the defaults to this internal utility function if we find a need later.


Parameters
----------
da : array
Input data
offset: Scalar with the same type of array or None
If None, subtract minimum values to reduce round off error
datetime_unit: None or any of {'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms',
'us', 'ns', 'ps', 'fs', 'as'}
dtype: target dtype

Returns
-------
array
"""
if array.dtype.kind not in ['m', 'M']:
return array.astype(dtype)
if offset is None:
offset = np.min(array)
offset = array.min()
array = array - offset

if datetime_unit:
Expand Down
23 changes: 21 additions & 2 deletions xarray/tests/test_cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@
from datetime import timedelta
from xarray.coding.cftimeindex import (
parse_iso8601, CFTimeIndex, assert_all_valid_date_type,
_parsed_string_to_bounds, _parse_iso8601_with_reso)
_parsed_string_to_bounds, _parse_iso8601_with_reso,
_parse_array_of_cftime_strings)
from xarray.tests import assert_array_equal, assert_identical

from . import has_cftime, has_cftime_or_netCDF4
from . import has_cftime, has_cftime_or_netCDF4, requires_cftime
from .test_coding_times import _all_cftime_date_types


Expand Down Expand Up @@ -616,3 +617,21 @@ def test_concat_cftimeindex(date_type, enable_cftimeindex):
def test_empty_cftimeindex():
index = CFTimeIndex([])
assert index.date_type is None


@requires_cftime
def test_parse_array_of_cftime_strings():
from cftime import DatetimeNoLeap

strings = np.array(['2000-01-01', '2000-01-02'])
expected = np.array([DatetimeNoLeap(2000, 1, 1),
DatetimeNoLeap(2000, 1, 2)])

result = _parse_array_of_cftime_strings(strings, DatetimeNoLeap)
np.testing.assert_array_equal(result, expected)

# Test scalar array case
strings = np.array('2000-01-01')
expected = np.array(DatetimeNoLeap(2000, 1, 1))
result = _parse_array_of_cftime_strings(strings, DatetimeNoLeap)
np.testing.assert_array_equal(result, expected)
Loading