Skip to content

Commit a1825b1

Browse files
committed
BUG: fix read_csv to parse timezone correctly
- use `box=True` for `to_datetime()` when parsing csv datetimes - convert the parsed datetimeindex into `ndarray` with `object` dtype so that it preserves timezone
1 parent 70e6f7c commit a1825b1

File tree

3 files changed

+20
-2
lines changed

3 files changed

+20
-2
lines changed

doc/source/whatsnew/v0.24.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -673,6 +673,7 @@ I/O
673673

674674
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
675675
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
676+
- :func:`read_csv()` will correctly parse timezone-aware datetimes. (:issue:`22256`)
676677
-
677678

678679
Plotting

pandas/io/parsers.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -3030,14 +3030,15 @@ def converter(*date_cols):
30303030
strs = _concat_date_cols(date_cols)
30313031

30323032
try:
3033-
return tools.to_datetime(
3033+
converted = tools.to_datetime(
30343034
ensure_object(strs),
30353035
utc=None,
3036-
box=False,
3036+
box=True,
30373037
dayfirst=dayfirst,
30383038
errors='ignore',
30393039
infer_datetime_format=infer_datetime_format
30403040
)
3041+
return np.array(converted.tolist())
30413042
except:
30423043
return tools.to_datetime(
30433044
parsing.try_parse_dates(strs, dayfirst=dayfirst))

pandas/tests/io/parser/parse_dates.py

+16
Original file line numberDiff line numberDiff line change
@@ -674,3 +674,19 @@ def test_parse_date_float(self, data, expected, parse_dates):
674674
# (i.e. float precision should remain unchanged).
675675
result = self.read_csv(StringIO(data), parse_dates=parse_dates)
676676
tm.assert_frame_equal(result, expected)
677+
678+
def test_parse_timezone(self):
679+
import pytz
680+
data = """dt,val
681+
2018-01-04 09:01:00+09:00,23350
682+
2018-01-04 09:02:00+09:00,23400
683+
2018-01-04 09:03:00+09:00,23400
684+
2018-01-04 09:04:00+09:00,23400
685+
2018-01-04 09:05:00+09:00,23400"""
686+
parsed = self.read_csv(StringIO(data), parse_dates=['dt'])
687+
dti = pd.DatetimeIndex(start='2018-01-04 09:01:00',
688+
end='2018-01-04 09:05:00', freq='1min',
689+
tz=pytz.FixedOffset(540))
690+
expected_data = {'dt': dti, 'val': [23350, 23400, 23400, 23400, 23400]}
691+
expected = DataFrame(expected_data)
692+
tm.assert_frame_equal(parsed, expected)

0 commit comments

Comments
 (0)