-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Period overflows with dates > pd.Timestamp.max 2262-04-11 #28104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pd.Timestamp is only implemented within the bounds Timestamp('1677-09-21 00:12:43.145225') to Timestamp('2262-04-11 23:47:16.854775807'). Outside of that range we could have start_time and end_time return datetime objects I guess |
This looks related to https://github.com/pandas-dev/pandas/pull/27916/ but outside my comfort zone to try altering to return python datetime objects for start_time and end_time when the Period is outside the range of pd.Timestamp |
I don't think the return type should depend on the value.
Is there any advantage to returning a Timestamp, rather than always return
a datetime? If not, we could deprecate Period.start_time and Period.end_time
in favor of a property that always returns a datetime.
…On Tue, Aug 27, 2019 at 3:04 PM Paul Lilley ***@***.***> wrote:
This looks related to #27916/
<http://url> but outside my comfort zone to try altering to return python
datetime objects for start_time and end_time when the Period is outside the
range of pd.Timestamp
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#28104?email_source=notifications&email_token=AAKAOIQ6OBZ2GYMDDRAFEB3QGWCEFA5CNFSM4IOZMFA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5I6JWY#issuecomment-525460699>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIRN3CNYW5YMXGZR53TQGWCEFANCNFSM4IOZMFAQ>
.
|
I can't really comment on the advantages of returning a Timestamp, but as Period is the recommended means to handle datetimes outside pd.Timestamp (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#representing-out-of-bounds-spans) it makes more sense (at least to me) that Period methods don't raise exceptions with datetimes outside pd.Timestamp. |
That's mostly in the context of an array of values inside a Series / DataFrame. When we're returning scalar, there's no need to return a Period (which has different semantics to a datetime). |
Good point. |
Need a name. Perhaps `Period.start_datetime`?
…On Tue, Aug 27, 2019 at 3:32 PM Paul Lilley ***@***.***> wrote:
Good point.
I agree that it makes sense to deprecate Period.start_time and
Period.end_time
in favor of a property that always returns a datetime
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28104?email_source=notifications&email_token=AAKAOIR3SJQXOX2XL7JW4PDQGWFNHA5CNFSM4IOZMFA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5JAY6I#issuecomment-525470841>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITUL2A7FTGXDAWLQKLQGWFNHANCNFSM4IOZMFAQ>
.
|
Sounds good, |
this will still run up against the datetime implementation bounds. do we care about those? |
Fair point, though at least the datetime.MINYEAR to datetime.MAXYEAR range covers the typical 'max-date' values (9999-12-31 in DB2 and Teradata, 31DEC9999 in SAS) that I've encountered in datawarehousing tables used to implement type 2 slowly changing dimensions with start_date/end_date. Often the currently valid row is marked by end_date = some (arbitrarily high) 'max-date' to avoid nulls in the column. |
I personally don't care about representing date times that python can't.
…On Tue, Aug 27, 2019 at 11:21 PM Paul Lilley ***@***.***> wrote:
Fair point, though at least the datetime.MINYEAR to datetime.MAXYEAR range
covers the typical 'max-date' values (9999-12-31 in DB2 and Teradata,
31DEC9999 in SAS) that I've encountered in datawarehousing tables used to
implement type 2 slowly changing dimensions with start_date/end_date. Often
the currently valid row is marked by end_date = some (arbitrarily high)
'max-date' to avoid nulls in the column.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28104?email_source=notifications&email_token=AAKAOITELKKKPYYPRVK4NX3QGX4M3A5CNFSM4IOZMFA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5J25XA#issuecomment-525577948>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQDVHANQROUJPAUR6LQGX4M3ANCNFSM4IOZMFAQ>
.
|
Hi guys, just to add: |
now that we have non-nano Timestamp support we could handle this |
Code Sample, a copy-pastable example if possible
Output: (with pandas 0.25.0)
2262-04-11 00:00:00
1677-09-21 00:25:26.290448384
with pandas 0.25.0+216.g1aca08aa6
I get an OutOfBoundsDatetime error
Problem description
Overflow means that the Period is incorrect, and also hinders use of Period for date/times outside the limits of pd.Timestamp
Expected Output
Output:
2262-04-11 00:00:00
2262-04-12 00:00:00
Output of
pd.show_versions()
pandas : 0.25.0
numpy : 1.16.3
pytz : 2018.9
dateutil : 2.7.5
pip : 19.1.1
setuptools : 40.8.0
Cython : None
pytest : 4.2.0
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.3.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.3
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.1
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.1.2
The text was updated successfully, but these errors were encountered: