-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ERR: raise in tz incompat with DatetimeIndex ops #8306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @immerrr |
I think that the point in raising in such situations is to prevent guessing in the face of ambiguity. There's an ambiguity when you do pretty much any binary operation between a naive and an aware timestamps as the former can correspond to about 40 different wallclock timestamps and it's unclear which one exactly, so this should be reported right away just as with vanilla python I have doubts about "subtraction of datetime like" though as there's no ambiguity in subtracting two aware timestamps from different timezones. There also may be very annoying issues because AFAIR dst transition involves changing timezones and subtracting a properly localized summer timestamp from the winter one will raise then. "intersection / union / difference" is ambiguous w.r.t what should be the tz of the result, seems ok to raise. As for "reindex", when I'm doing |
The reason to raise on subtraction is that its almost always an unintended behavior. The user should have them in the same timezone. Yes you can auto-convert it, but I just think that's asking for trouble. Reindexing is the same thing, yes it can work (and maybe even is non-ambiguous), forget about transition times for a second. The problem is that it is almost never what you actually want and is likely an error. cc @rockg |
I agree that always forcing the user to convert themselves for subtraction makes the most sense. Trying to guess what the user intends is always a frustrating endeavor for both the programmer and user. For reindex if the indices are the same size then I think it's fine as it is essentially a replacement. But if you are using a method to fill in missing values, then it gets hairy. Maybe we need to look at a couple of examples to get on the same page. |
My opinion made a complete about-face while I was writing this comment. I now think the In [1]: import pandas as pd
In [2]: pd.Series(1, index=[1, 2, 3])
Out[2]:
1 1
2 1
3 1
dtype: int64
In [3]: _.index
Out[3]: Int64Index([1, 2, 3], dtype='int64')
In [4]: _2.reindex([1., 2., 3.])
Out[4]:
1 1
2 1
3 1
dtype: int64
In [5]: _.index
Out[5]: Float64Index([1.0, 2.0, 3.0], dtype='float64') It's like any other coercion of one value to an equal value of a different type. Naive |
|
I am thinking we should simply raise if its not a completely non-ambiguous conversion, e.g. or are you ok with reindex on an Index returns a tuple of new_index, indexer [9] 'works' but is a bit odd (this is on master, I guess the suggestion is to make [9] work like [10]), but have tz='US/Eastern' in the result
|
Speaking of metadata and dst transitions, what happens now if there's a date range that starts in summer time and ends in winter time? |
Right, I was even paying attention to the I think since a In [1]: import pandas as pd
In [2]: idx = pd.date_range('20131101', tz='America/Chicago', periods=7)
In [3]: newidx = pd.date_range('20131103', tz='UTC', periods=36, freq='H')
In [4]: pd.Series(range(7), index=idx).reindex(newidx, method='ffill')
Out[4]:
2013-11-03 00:00:00+00:00 1
2013-11-03 01:00:00+00:00 1
2013-11-03 02:00:00+00:00 1
2013-11-03 03:00:00+00:00 1
2013-11-03 04:00:00+00:00 1
2013-11-03 05:00:00+00:00 2 # midnight before DST ends, offset = -0500
2013-11-03 06:00:00+00:00 2
2013-11-03 07:00:00+00:00 2
2013-11-03 08:00:00+00:00 2
2013-11-03 09:00:00+00:00 2
2013-11-03 10:00:00+00:00 2
2013-11-03 11:00:00+00:00 2
2013-11-03 12:00:00+00:00 2
2013-11-03 13:00:00+00:00 2
2013-11-03 14:00:00+00:00 2
2013-11-03 15:00:00+00:00 2
2013-11-03 16:00:00+00:00 2
2013-11-03 17:00:00+00:00 2
2013-11-03 18:00:00+00:00 2
2013-11-03 19:00:00+00:00 2
2013-11-03 20:00:00+00:00 2
2013-11-03 21:00:00+00:00 2
2013-11-03 22:00:00+00:00 2
2013-11-03 23:00:00+00:00 2
2013-11-04 00:00:00+00:00 2
2013-11-04 01:00:00+00:00 2
2013-11-04 02:00:00+00:00 2
2013-11-04 03:00:00+00:00 2
2013-11-04 04:00:00+00:00 2
2013-11-04 05:00:00+00:00 2
2013-11-04 06:00:00+00:00 3 # midnight after DST ends, offset = -0600
2013-11-04 07:00:00+00:00 3
2013-11-04 08:00:00+00:00 3
2013-11-04 09:00:00+00:00 3
2013-11-04 10:00:00+00:00 3
2013-11-04 11:00:00+00:00 3
Freq: H, dtype: int64 and disapprove of this current behavior: In [24]: pd.Series(range(7), index=idx).reindex(newidx.tz_convert(None), method='ffill')
Out[24]: # Should raise
2013-11-03 00:00:00 1
2013-11-03 01:00:00 1
2013-11-03 02:00:00 1
2013-11-03 03:00:00 1
2013-11-03 04:00:00 1
2013-11-03 05:00:00 2
2013-11-03 06:00:00 2
2013-11-03 07:00:00 2
2013-11-03 08:00:00 2
2013-11-03 09:00:00 2
2013-11-03 10:00:00 2
2013-11-03 11:00:00 2
2013-11-03 12:00:00 2
2013-11-03 13:00:00 2
2013-11-03 14:00:00 2
2013-11-03 15:00:00 2
2013-11-03 16:00:00 2
2013-11-03 17:00:00 2
2013-11-03 18:00:00 2
2013-11-03 19:00:00 2
2013-11-03 20:00:00 2
2013-11-03 21:00:00 2
2013-11-03 22:00:00 2
2013-11-03 23:00:00 2
2013-11-04 00:00:00 2
2013-11-04 01:00:00 2
2013-11-04 02:00:00 2
2013-11-04 03:00:00 2
2013-11-04 04:00:00 2
2013-11-04 05:00:00 2
2013-11-04 06:00:00 3
2013-11-04 07:00:00 3
2013-11-04 08:00:00 3
2013-11-04 09:00:00 3
2013-11-04 10:00:00 3
2013-11-04 11:00:00 3
Freq: H, dtype: int64 Also, if we were to have the values change at midnight UTC, there might be the undesirable possibility of repeated conversions going "around the world" and ending up off by a day. |
@ischwabacher interesting, so you are ok with reindexing a tz-aware with a UTC (explicity) series, but not a naive series. Maybe like
|
My stance on naive times is that they are bad and they should feel bad. They correspond to an oversimplified version of reality, and they will necessarily cause headaches if they are mixed with time-zone-aware objects. These headaches should happen right away, not in several months when a DST transition occurs and everything breaks. The naive On the other hand, the whole point of time zone awareness is that aware times refer to a specific point in time, and can be compared even if they're not in the same time zone. Taking the union / intersection of collections of times in different time zones is more problematic, but I think the stdlib precedent should be good enough for us: In [1]: {1} & {1.0}
Out[1]: {1.0}
In [2]: {1.0} & {1}
Out[2]: {1}
In [3]: {1} | {1.0}
Out[3]: {1}
In [4]: {1.0} | {1}
Out[4]: {1.0} EDIT: Erm, numpy sets a conflicting but ultimately reasonable precedent: In [1]: import numpy as np
In [2]: np.intersect1d(np.arange(3, dtype=int), np.arange(1, 4, dtype=float))
Out[2]: array([ 1., 2.])
In [3]: np.intersect1d(np.arange(3, dtype=float), np.arange(1, 4, dtype=int))
Out[3]: array([ 1., 2.])
In [4]: np.union1d(np.arange(3, dtype='i4'), np.arange(1, 4, dtype='i2'))
Out[4]: array([0, 1, 2, 3], dtype=int32)
In [5]: np.union1d(np.arange(3, dtype='i4'), np.arange(1, 4, dtype='i8'))
Out[5]: array([0, 1, 2, 3]) Conforming to this would mean converting all the times to UTC; conforming to the stdlib would mean taking either the first or the second time zone as appropriate. |
@ischwabacher thank you, I found out something new today with those set examples :) Numpy examples are interesting, too, but all of them exercise lossless casts (well, almost, in float64 there're 53 significand bits) which follow a simple set of rules that are surprisingly unambiguous, which you can see for yourself by doing something like: In [32]: for i,c1 in enumerate( np.typecodes['All']):
for c2 in np.typecodes['All'][i+1:]:
if c1 != c2 and np.can_cast(c1, c2) and np.can_cast(c2, c1):
print '{d1}({c1})<=>{d2}({c2})'.format(c1=c1,d1=np.dtype(c1), c2=c2,d2=np.dtype(c2))
....:
int64(l)<=>int64(q)
int64(l)<=>int64(p)
int64(q)<=>int64(p)
uint64(L)<=>uint64(Q)
uint64(L)<=>uint64(P)
uint64(Q)<=>uint64(P)
|V0(V)<=>object(O) In numpy only type aliases are mutually castable and the rest of the relations are one-sided. Which departs somewhat from our case because tz conversions are lossless in both directions and thus can be seen as ambiguous. Declaring UTC as a "superclass" might work but it would come as a surprise for me at least, like if I'd put first some euro and then some pounds to my pocket and they come out as dollars instead. |
So did I; I didn't know what was going to come out until I tried them. :) I like your argument about there not being distinct mutually castable types, but I think it's really important not to have friction when handling time-zone-aware objects. Then again, most of the friction I've been experiencing has been caused by my local If numpy disallows intercoercible types because it has a clever algorithm for computing output dtype from input dtype that doesn't work in the presence of cycles in the coercion graph, and we want to appropriate this algorithm, then that would be a pretty strong argument against allowing coercion across time zones, except possibly to UTC. But maybe it's not too surprising that when you put Lire, Francs and Marks together in your pocket, they come out as Euros? |
xref #6552
The text was updated successfully, but these errors were encountered: