Skip to content

BUG: Convert non-dates in xls date cells to number #13042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.18.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,4 @@ Performance Improvements

Bug Fixes
~~~~~~~~~
- If there is a large numeric value in excel cell of type 'date', convert it to float instead of raising an error (:issue:`10001`)
16 changes: 12 additions & 4 deletions pandas/io/excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,11 +329,15 @@ def _parse_cell(cell_contents, cell_typ):
appropriate object"""

if cell_typ == XL_CELL_DATE:

if xlrd_0_9_3:
# Use the newer xlrd datetime handling.
cell_contents = xldate.xldate_as_datetime(cell_contents,
epoch1904)

try:
cell_contents = \
xldate.xldate_as_datetime(cell_contents,
epoch1904)
except OverflowError:
return cell_contents
# Excel doesn't distinguish between dates and time,
# so we treat dates on the epoch as times only.
# Also, Excel supports 1900 and 1904 epochs.
Expand All @@ -346,7 +350,11 @@ def _parse_cell(cell_contents, cell_typ):
cell_contents.microsecond)
else:
# Use the xlrd <= 0.9.2 date handling.
dt = xldate.xldate_as_tuple(cell_contents, epoch1904)
try:
dt = xldate.xldate_as_tuple(cell_contents, epoch1904)

except xldate.XLDateTooLarge:
return cell_contents

if dt[0] < MINYEAR:
cell_contents = time(*dt[3:])
Expand Down
Binary file added pandas/io/tests/data/testdateoverflow.xls
Binary file not shown.
Binary file added pandas/io/tests/data/testdateoverflow.xlsm
Binary file not shown.
Binary file added pandas/io/tests/data/testdateoverflow.xlsx
Binary file not shown.
10 changes: 10 additions & 0 deletions pandas/io/tests/test_excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,16 @@ def test_set_column_names_in_parameter(self):
tm.assert_frame_equal(xlsdf_no_head, refdf)
tm.assert_frame_equal(xlsdf_with_head, refdf)

def test_date_conversion_overflow(self):
# GH 10001 : pandas.ExcelFile ignore parse_dates=False
refdf = pd.DataFrame([[pd.Timestamp('2016-03-12'), 'Marc Johnson'],
[pd.Timestamp('2016-03-16'), 'Jack Black'],
[1e+20, 'Timothy Brown']],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am think this would be converted as an integer, right? (or is it actually a float)

Copy link
Contributor Author

@kordek kordek May 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation it is a float (simply cell_contents is returned). Or such way would be more desired ?:

val = int(cell_contents)
                if val == cell_contents:
                    cell_contents = val

columns=['DateColWithBigInt', 'StringCol'])

act_df = self.get_exceldf('testdateoverflow')
tm.assert_frame_equal(refdf, act_df)


class XlrdTests(ReadingTestsBase):
"""
Expand Down