BUG: fix read_csv to parse timezone correctly

swyoon · swyoon · commit b0430cb18b0c · 2018-08-19T11:07:36.000+09:00
- use box=True for to_datetime(), and adjust downstream processing to
the change.
diff --git a/doc/source/whatsnew/v0.24.0.txt b/doc/source/whatsnew/v0.24.0.txt
@@ -674,6 +674,7 @@ I/O
 
 - :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
 - :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
+- :func:`read_csv()` will correctly parse timezone-aware datetimes. (:issue:`22256`)
 -
 
 Plotting
diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
@@ -1638,15 +1638,16 @@ def _infer_types(self, values, na_values, try_num_bool=True):
             except Exception:
                 result = values
                 if values.dtype == np.object_:
-                    na_count = parsers.sanitize_objects(result, na_values,
-                                                        False)
+                    na_count = parsers.sanitize_objects(np.asarray(result),
+                                                        na_values, False)
         else:
             result = values
             if values.dtype == np.object_:
-                na_count = parsers.sanitize_objects(values, na_values, False)
+                na_count = parsers.sanitize_objects(np.asarray(values),
+                                                    na_values, False)
 
         if result.dtype == np.object_ and try_num_bool:
-            result = libops.maybe_convert_bool(values,
+            result = libops.maybe_convert_bool(np.asarray(values),
                                                true_values=self.true_values,
                                                false_values=self.false_values)
 
@@ -3033,7 +3034,7 @@ def converter(*date_cols):
                 return tools.to_datetime(
                     ensure_object(strs),
                     utc=None,
-                    box=False,
+                    box=True,
                     dayfirst=dayfirst,
                     errors='ignore',
                     infer_datetime_format=infer_datetime_format
diff --git a/pandas/tests/io/parser/parse_dates.py b/pandas/tests/io/parser/parse_dates.py
@@ -674,3 +674,19 @@ def test_parse_date_float(self, data, expected, parse_dates):
         # (i.e. float precision should remain unchanged).
         result = self.read_csv(StringIO(data), parse_dates=parse_dates)
         tm.assert_frame_equal(result, expected)
+
+    def test_parse_timezone(self):
+        import pytz
+        data = """dt,val
+                  2018-01-04 09:01:00+09:00,23350
+                  2018-01-04 09:02:00+09:00,23400
+                  2018-01-04 09:03:00+09:00,23400
+                  2018-01-04 09:04:00+09:00,23400
+                  2018-01-04 09:05:00+09:00,23400"""
+        parsed = self.read_csv(StringIO(data), parse_dates=['dt'])
+        dti = pd.DatetimeIndex(start='2018-01-04 09:01:00',
+                               end='2018-01-04 09:05:00', freq='1min',
+                               tz=pytz.FixedOffset(540))
+        expected_data = {'dt': dti, 'val': [23350, 23400, 23400, 23400, 23400]}
+        expected = DataFrame(expected_data)
+        tm.assert_frame_equal(parsed, expected)

Original file line number	Diff line number	Diff line change
`@@ -674,6 +674,7 @@ I/O`
`674`	`674`
`675`	`675`	- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
`676`	`676`	- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
	`677`	+- :func:`read_csv()` will correctly parse timezone-aware datetimes. (:issue:`22256`)
`677`	`678`	`-`
`678`	`679`
`679`	`680`	`Plotting`