Skip to content

Commit 656f8bd

Browse files
spencerkclarkshoyer
authored andcommitted
Switch enable_cftimeindex to True by default (#2516)
* Switch enable_cftimeindex to True by default * Add a friendlier error message when plotting cftime objects * Mention that the non-standard calendars are used in climate science * Add GH issue references to docs * Deprecate enable_cftimeindex option * Add CFTimeIndex.to_datetimeindex method * Add friendlier error message for resample * lint * Address review comments * Take into account microsecond attribute in cftime_to_nptime * Add test for decoding dates with microsecond-resolution units This would have failed before including the microsecond attribute of each date in cftime_to_nptime in eaa4a44. * Fix typo in time-series.rst * Formatting * Fix test_decode_cf_datetime_non_iso_strings * Prevent warning emitted from set_options.__exit__
1 parent 17815b4 commit 656f8bd

15 files changed

+447
-387
lines changed

doc/api-hidden.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,4 @@
152152
plot.FacetGrid.map
153153

154154
CFTimeIndex.shift
155+
CFTimeIndex.to_datetimeindex

doc/time-series.rst

Lines changed: 63 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -71,10 +71,11 @@ One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
7171
native representation of dates to those that fall between the years 1678 and
7272
2262. When a netCDF file contains dates outside of these bounds, dates will be
7373
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
74-
can be used for indexing. The :py:class:`~xarray.CFTimeIndex` enables only a subset of
75-
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only enabled
76-
when using the standalone version of ``cftime`` (not the version packaged with
77-
earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more information.
74+
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
75+
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
76+
fully compatible with the standalone version of ``cftime`` (not the version
77+
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
78+
information.
7879

7980
Datetime indexing
8081
-----------------
@@ -221,18 +222,28 @@ Non-standard calendars and dates outside the Timestamp-valid range
221222
Through the standalone ``cftime`` library and a custom subclass of
222223
:py:class:`pandas.Index`, xarray supports a subset of the indexing
223224
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
224-
dates from non-standard calendars or dates using a standard calendar, but
225-
outside the `Timestamp-valid range`_ (approximately between years 1678 and
226-
2262). This behavior has not yet been turned on by default; to take advantage
227-
of this functionality, you must have the ``enable_cftimeindex`` option set to
228-
``True`` within your context (see :py:func:`~xarray.set_options` for more
229-
information). It is expected that this will become the default behavior in
230-
xarray version 0.11.
231-
232-
For instance, you can create a DataArray indexed by a time
233-
coordinate with a no-leap calendar within a context manager setting the
234-
``enable_cftimeindex`` option, and the time index will be cast to a
235-
:py:class:`~xarray.CFTimeIndex`:
225+
dates from non-standard calendars commonly used in climate science or dates
226+
using a standard calendar, but outside the `Timestamp-valid range`_
227+
(approximately between years 1678 and 2262).
228+
229+
.. note::
230+
231+
As of xarray version 0.11, by default, :py:class:`cftime.datetime` objects
232+
will be used to represent times (either in indexes, as a
233+
:py:class:`~xarray.CFTimeIndex`, or in data arrays with dtype object) if
234+
any of the following are true:
235+
236+
- The dates are from a non-standard calendar
237+
- Any dates are outside the Timestamp-valid range.
238+
239+
Otherwise pandas-compatible dates from a standard calendar will be
240+
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
241+
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
242+
and their full set of associated features.
243+
244+
For example, you can create a DataArray indexed by a time
245+
coordinate with dates from a no-leap calendar and a
246+
:py:class:`~xarray.CFTimeIndex` will automatically be used:
236247

237248
.. ipython:: python
238249
@@ -241,27 +252,11 @@ coordinate with a no-leap calendar within a context manager setting the
241252
242253
dates = [DatetimeNoLeap(year, month, 1) for year, month in
243254
product(range(1, 3), range(1, 13))]
244-
with xr.set_options(enable_cftimeindex=True):
245-
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'],
246-
name='foo')
255+
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
247256
248-
.. note::
249-
250-
With the ``enable_cftimeindex`` option activated, a :py:class:`~xarray.CFTimeIndex`
251-
will be used for time indexing if any of the following are true:
252-
253-
- The dates are from a non-standard calendar
254-
- Any dates are outside the Timestamp-valid range
255-
256-
Otherwise a :py:class:`pandas.DatetimeIndex` will be used. In addition, if any
257-
variable (not just an index variable) is encoded using a non-standard
258-
calendar, its times will be decoded into :py:class:`cftime.datetime` objects,
259-
regardless of whether or not they can be represented using
260-
``np.datetime64[ns]`` objects.
261-
262257
xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
263-
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For instance, we can
264-
create the same dates and DataArray we created above using:
258+
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For
259+
instance, we can create the same dates and DataArray we created above using:
265260

266261
.. ipython:: python
267262
@@ -317,13 +312,42 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
317312

318313
.. ipython:: python
319314
320-
da.to_netcdf('example.nc')
321-
xr.open_dataset('example.nc')
315+
da.to_netcdf('example-no-leap.nc')
316+
xr.open_dataset('example-no-leap.nc')
322317
323318
.. note::
324319

325-
Currently resampling along the time dimension for data indexed by a
326-
:py:class:`~xarray.CFTimeIndex` is not supported.
320+
While much of the time series functionality that is possible for standard
321+
dates has been implemented for dates from non-standard calendars, there are
322+
still some remaining important features that have yet to be implemented,
323+
for example:
324+
325+
- Resampling along the time dimension for data indexed by a
326+
:py:class:`~xarray.CFTimeIndex` (:issue:`2191`, :issue:`2458`)
327+
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes
328+
(:issue:`2164`).
329+
330+
For some use-cases it may still be useful to convert from
331+
a :py:class:`~xarray.CFTimeIndex` to a :py:class:`pandas.DatetimeIndex`,
332+
despite the difference in calendar types (e.g. to allow the use of some
333+
forms of resample with non-standard calendars). The recommended way of
334+
doing this is to use the built-in
335+
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex` method:
336+
337+
.. ipython:: python
338+
339+
modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')
340+
da = xr.DataArray(range(24), [('time', modern_times)])
341+
da
342+
datetimeindex = da.indexes['time'].to_datetimeindex()
343+
da['time'] = datetimeindex
344+
da.resample(time='Y').mean('time')
345+
346+
However in this case one should use caution to only perform operations which
347+
do not depend on differences between dates (e.g. differentiation,
348+
interpolation, or upsampling with resample), as these could introduce subtle
349+
and silent errors due to the difference in calendar types between the dates
350+
encoded in your data and the dates stored in memory.
327351

328352
.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
329353
.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601

doc/whats-new.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,22 @@ v0.11.0 (unreleased)
3333
Breaking changes
3434
~~~~~~~~~~~~~~~~
3535

36+
- For non-standard calendars commonly used in climate science, xarray will now
37+
always use :py:class:`cftime.datetime` objects, rather than by default try to
38+
coerce them to ``np.datetime64[ns]`` objects. A
39+
:py:class:`~xarray.CFTimeIndex` will be used for indexing along time
40+
coordinates in these cases. A new method,
41+
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex`, has been added
42+
to aid in converting from a :py:class:`~xarray.CFTimeIndex` to a
43+
:py:class:`pandas.DatetimeIndex` for the remaining use-cases where
44+
using a :py:class:`~xarray.CFTimeIndex` is still a limitation (e.g. for
45+
resample or plotting). Setting the ``enable_cftimeindex`` option is now a
46+
no-op and emits a ``FutureWarning``.
47+
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`.
48+
Call :py:meth:`Dataset.transpose` directly instead.
49+
- Iterating over a ``Dataset`` now includes only data variables, not coordinates.
50+
Similarily, calling ``len`` and ``bool`` on a ``Dataset`` now
51+
includes only data variables
3652
- Finished deprecation cycles:
3753
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`.
3854
Call :py:meth:`Dataset.transpose` directly instead.

xarray/coding/cftimeindex.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
from __future__ import absolute_import
4343

4444
import re
45+
import warnings
4546
from datetime import timedelta
4647

4748
import numpy as np
@@ -50,6 +51,8 @@
5051
from xarray.core import pycompat
5152
from xarray.core.utils import is_scalar
5253

54+
from .times import cftime_to_nptime, infer_calendar_name, _STANDARD_CALENDARS
55+
5356

5457
def named(name, pattern):
5558
return '(?P<' + name + '>' + pattern + ')'
@@ -381,6 +384,56 @@ def _add_delta(self, deltas):
381384
# pandas. No longer used as of pandas 0.23.
382385
return self + deltas
383386

387+
def to_datetimeindex(self, unsafe=False):
388+
"""If possible, convert this index to a pandas.DatetimeIndex.
389+
390+
Parameters
391+
----------
392+
unsafe : bool
393+
Flag to turn off warning when converting from a CFTimeIndex with
394+
a non-standard calendar to a DatetimeIndex (default ``False``).
395+
396+
Returns
397+
-------
398+
pandas.DatetimeIndex
399+
400+
Raises
401+
------
402+
ValueError
403+
If the CFTimeIndex contains dates that are not possible in the
404+
standard calendar or outside the pandas.Timestamp-valid range.
405+
406+
Warns
407+
-----
408+
RuntimeWarning
409+
If converting from a non-standard calendar to a DatetimeIndex.
410+
411+
Warnings
412+
--------
413+
Note that for non-standard calendars, this will change the calendar
414+
type of the index. In that case the result of this method should be
415+
used with caution.
416+
417+
Examples
418+
--------
419+
>>> import xarray as xr
420+
>>> times = xr.cftime_range('2000', periods=2, calendar='gregorian')
421+
>>> times
422+
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00], dtype='object')
423+
>>> times.to_datetimeindex()
424+
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[ns]', freq=None)
425+
""" # noqa: E501
426+
nptimes = cftime_to_nptime(self)
427+
calendar = infer_calendar_name(self)
428+
if calendar not in _STANDARD_CALENDARS and not unsafe:
429+
warnings.warn(
430+
'Converting a CFTimeIndex with dates from a non-standard '
431+
'calendar, {!r}, to a pandas.DatetimeIndex, which uses dates '
432+
'from the standard calendar. This may lead to subtle errors '
433+
'in operations that depend on the length of time between '
434+
'dates.'.format(calendar), RuntimeWarning)
435+
return pd.DatetimeIndex(nptimes)
436+
384437

385438
def _parse_iso8601_without_reso(date_type, datetime_str):
386439
date, _ = _parse_iso8601_with_reso(date_type, datetime_str)

xarray/coding/times.py

Lines changed: 27 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
from ..core import indexing
1313
from ..core.common import contains_cftime_datetimes
1414
from ..core.formatting import first_n_items, format_timestamp, last_item
15-
from ..core.options import OPTIONS
1615
from ..core.pycompat import PY3
1716
from ..core.variable import Variable
1817
from .variables import (
@@ -61,8 +60,9 @@ def _require_standalone_cftime():
6160
try:
6261
import cftime # noqa: F401
6362
except ImportError:
64-
raise ImportError('Using a CFTimeIndex requires the standalone '
65-
'version of the cftime library.')
63+
raise ImportError('Decoding times with non-standard calendars '
64+
'or outside the pandas.Timestamp-valid range '
65+
'requires the standalone cftime package.')
6666

6767

6868
def _netcdf_to_numpy_timeunit(units):
@@ -84,41 +84,32 @@ def _unpack_netcdf_time_units(units):
8484
return delta_units, ref_date
8585

8686

87-
def _decode_datetime_with_cftime(num_dates, units, calendar,
88-
enable_cftimeindex):
87+
def _decode_datetime_with_cftime(num_dates, units, calendar):
8988
cftime = _import_cftime()
90-
if enable_cftimeindex:
91-
_require_standalone_cftime()
89+
90+
if cftime.__name__ == 'cftime':
9291
dates = np.asarray(cftime.num2date(num_dates, units, calendar,
9392
only_use_cftime_datetimes=True))
9493
else:
94+
# Must be using num2date from an old version of netCDF4 which
95+
# does not have the only_use_cftime_datetimes option.
9596
dates = np.asarray(cftime.num2date(num_dates, units, calendar))
9697

9798
if (dates[np.nanargmin(num_dates)].year < 1678 or
9899
dates[np.nanargmax(num_dates)].year >= 2262):
99-
if not enable_cftimeindex or calendar in _STANDARD_CALENDARS:
100+
if calendar in _STANDARD_CALENDARS:
100101
warnings.warn(
101102
'Unable to decode time axis into full '
102103
'numpy.datetime64 objects, continuing using dummy '
103104
'cftime.datetime objects instead, reason: dates out '
104105
'of range', SerializationWarning, stacklevel=3)
105106
else:
106-
if enable_cftimeindex:
107-
if calendar in _STANDARD_CALENDARS:
108-
dates = cftime_to_nptime(dates)
109-
else:
110-
try:
111-
dates = cftime_to_nptime(dates)
112-
except ValueError as e:
113-
warnings.warn(
114-
'Unable to decode time axis into full '
115-
'numpy.datetime64 objects, continuing using '
116-
'dummy cftime.datetime objects instead, reason:'
117-
'{0}'.format(e), SerializationWarning, stacklevel=3)
107+
if calendar in _STANDARD_CALENDARS:
108+
dates = cftime_to_nptime(dates)
118109
return dates
119110

120111

121-
def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
112+
def _decode_cf_datetime_dtype(data, units, calendar):
122113
# Verify that at least the first and last date can be decoded
123114
# successfully. Otherwise, tracebacks end up swallowed by
124115
# Dataset.__repr__ when users try to view their lazily decoded array.
@@ -128,8 +119,7 @@ def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
128119
last_item(values) or [0]])
129120

130121
try:
131-
result = decode_cf_datetime(example_value, units, calendar,
132-
enable_cftimeindex)
122+
result = decode_cf_datetime(example_value, units, calendar)
133123
except Exception:
134124
calendar_msg = ('the default calendar' if calendar is None
135125
else 'calendar %r' % calendar)
@@ -145,8 +135,7 @@ def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
145135
return dtype
146136

147137

148-
def decode_cf_datetime(num_dates, units, calendar=None,
149-
enable_cftimeindex=False):
138+
def decode_cf_datetime(num_dates, units, calendar=None):
150139
"""Given an array of numeric dates in netCDF format, convert it into a
151140
numpy array of date time objects.
152141
@@ -200,8 +189,7 @@ def decode_cf_datetime(num_dates, units, calendar=None,
200189

201190
except (OutOfBoundsDatetime, OverflowError):
202191
dates = _decode_datetime_with_cftime(
203-
flat_num_dates.astype(np.float), units, calendar,
204-
enable_cftimeindex)
192+
flat_num_dates.astype(np.float), units, calendar)
205193

206194
return dates.reshape(num_dates.shape)
207195

@@ -291,7 +279,16 @@ def cftime_to_nptime(times):
291279
times = np.asarray(times)
292280
new = np.empty(times.shape, dtype='M8[ns]')
293281
for i, t in np.ndenumerate(times):
294-
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
282+
try:
283+
# Use pandas.Timestamp in place of datetime.datetime, because
284+
# NumPy casts it safely it np.datetime64[ns] for dates outside
285+
# 1678 to 2262 (this is not currently the case for
286+
# datetime.datetime).
287+
dt = pd.Timestamp(t.year, t.month, t.day, t.hour, t.minute,
288+
t.second, t.microsecond)
289+
except ValueError as e:
290+
raise ValueError('Cannot convert date {} to a date in the '
291+
'standard calendar. Reason: {}.'.format(t, e))
295292
new[i] = np.datetime64(dt)
296293
return new
297294

@@ -404,15 +401,12 @@ def encode(self, variable, name=None):
404401
def decode(self, variable, name=None):
405402
dims, data, attrs, encoding = unpack_for_decoding(variable)
406403

407-
enable_cftimeindex = OPTIONS['enable_cftimeindex']
408404
if 'units' in attrs and 'since' in attrs['units']:
409405
units = pop_to(attrs, encoding, 'units')
410406
calendar = pop_to(attrs, encoding, 'calendar')
411-
dtype = _decode_cf_datetime_dtype(
412-
data, units, calendar, enable_cftimeindex)
407+
dtype = _decode_cf_datetime_dtype(data, units, calendar)
413408
transform = partial(
414-
decode_cf_datetime, units=units, calendar=calendar,
415-
enable_cftimeindex=enable_cftimeindex)
409+
decode_cf_datetime, units=units, calendar=calendar)
416410
data = lazy_elemwise_func(data, transform, dtype)
417411

418412
return Variable(dims, data, attrs, encoding)

0 commit comments

Comments
 (0)