diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 14d712c822bdb..e11c612a510db 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -1711,36 +1711,26 @@ then the more *general* one will be used as the result of the operation. object conversion ~~~~~~~~~~~~~~~~~ -.. note:: - - The syntax of :meth:`~DataFrame.convert_objects` changed in 0.17.0. See - :ref:`API changes ` - for more details. - -:meth:`~DataFrame.convert_objects` is a method that converts columns from -the ``object`` dtype to datetimes, timedeltas or floats. For example, to -attempt conversion of object data that are *number like*, e.g. could be a -string that represents a number, pass ``numeric=True``. By default, this will -attempt a soft conversion and so will only succeed if the entire column is -convertible. To force the conversion, add the keyword argument ``coerce=True``. -This will force strings and number-like objects to be numbers if -possible, and other values will be set to ``np.nan``. +:meth:`~DataFrame.convert_objects` is a method to try to force conversion of types from the ``object`` dtype to other types. +To force conversion of specific types that are *number like*, e.g. could be a string that represents a number, +pass ``convert_numeric=True``. This will force strings and numbers alike to be numbers if possible, otherwise +they will be set to ``np.nan``. .. ipython:: python df3['D'] = '1.' df3['E'] = '1' - df3.convert_objects(numeric=True).dtypes + df3.convert_objects(convert_numeric=True).dtypes # same, but specific dtype conversion df3['D'] = df3['D'].astype('float16') df3['E'] = df3['E'].astype('int32') df3.dtypes -To force conversion to ``datetime64[ns]``, pass ``datetime=True`` and ``coerce=True``. +To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``. This will convert any datetime-like object to dates, forcing other values to ``NaT``. This might be useful if you are reading in data which is mostly dates, -but occasionally contains non-dates that you wish to represent as missing. +but occasionally has non-dates intermixed and you want to represent as missing. .. ipython:: python @@ -1749,15 +1739,10 @@ but occasionally contains non-dates that you wish to represent as missing. 'foo', 1.0, 1, pd.Timestamp('20010104'), '20010105'], dtype='O') s - s.convert_objects(datetime=True, coerce=True) + s.convert_objects(convert_dates='coerce') -Without passing ``coerce=True``, :meth:`~DataFrame.convert_objects` will attempt -*soft* conversion of any *object* dtypes, meaning that if all +In addition, :meth:`~DataFrame.convert_objects` will attempt the *soft* conversion of any *object* dtypes, meaning that if all the objects in a Series are of the same type, the Series will have that dtype. -Note that setting ``coerce=True`` does not *convert* arbitrary types to either -``datetime64[ns]`` or ``timedelta64[ns]``. For example, a series containing string -dates will not be converted to a series of datetimes. To convert between types, -see :ref:`converting to timestamps `. gotchas ~~~~~~~ diff --git a/doc/source/whatsnew/v0.17.0.txt b/doc/source/whatsnew/v0.17.0.txt index 61c34fc071282..79ca3f369d2ad 100644 --- a/doc/source/whatsnew/v0.17.0.txt +++ b/doc/source/whatsnew/v0.17.0.txt @@ -640,71 +640,6 @@ New Behavior: Timestamp.now() Timestamp.now() + offsets.DateOffset(years=1) -.. _whatsnew_0170.api_breaking.convert_objects: - -Changes to convert_objects -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`) - -===================== ============= -Previous Replacement -===================== ============= -``convert_dates`` ``datetime`` -``convert_numeric`` ``numeric`` -``convert_timedelta`` ``timedelta`` -===================== ============= - -Coercing types with ``DataFrame.convert_objects`` is now implemented using the -keyword argument ``coerce=True``. Previously types were coerced by setting a -keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``. - -.. ipython:: python - - df = pd.DataFrame({'i': ['1','2'], - 'f': ['apple', '4.2'], - 's': ['apple','banana']}) - df - -The old usage of ``DataFrame.convert_objects`` used ``'coerce'`` along with the -type. - -.. code-block:: python - - In [2]: df.convert_objects(convert_numeric='coerce') - -Now the ``coerce`` keyword must be explicitly used. - -.. ipython:: python - - df.convert_objects(numeric=True, coerce=True) - -In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce -numeric types when there were no values convertible to a numeric type. This returns -the original DataFrame with no conversion. - -.. code-block:: python - - In [1]: df = pd.DataFrame({'s': ['a','b']}) - In [2]: df.convert_objects(convert_numeric='coerce') - Out[2]: - s - 0 a - 1 b - -The new behavior will convert all non-number-like strings to ``NaN``, -when ``coerce=True`` is passed explicity. - -.. ipython:: python - - pd.DataFrame({'s': ['a','b']}) - df.convert_objects(numeric=True, coerce=True) - -In earlier versions of pandas, the default behavior was to try and convert -datetimes and timestamps. The new default is for ``DataFrame.convert_objects`` -to do nothing, and so it is necessary to pass at least one conversion target -in the method call. - Changes to Index Comparisons ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -992,6 +927,7 @@ Deprecations - ``Series.is_time_series`` deprecated in favor of ``Series.index.is_all_dates`` (:issue:`11135`) - Legacy offsets (like ``'A@JAN'``) listed in :ref:`here ` are deprecated (note that this has been alias since 0.8.0), (:issue:`10878`) - ``WidePanel`` deprecated in favor of ``Panel``, ``LongPanel`` in favor of ``DataFrame`` (note these have been aliases since < 0.11.0), (:issue:`10892`) +- ``DataFrame.convert_objects`` has been deprecated in favor of type-specific function ``pd.to_datetime``, ``pd.to_timestamp`` and ``pd.to_numeric`` (:issue:`11133`). .. _whatsnew_0170.prior_deprecations: @@ -1187,3 +1123,5 @@ Bug Fixes - Bug in ``DataFrame`` construction from nested ``dict`` with ``timedelta`` keys (:issue:`11129`) - Bug in ``.fillna`` against may raise ``TypeError`` when data contains datetime dtype (:issue:`7095`, :issue:`11153`) - Bug in ``.groupby`` when number of keys to group by is same as length of index (:issue:`11185`) +- Bug in ``convert_objects`` where converted values might not be returned if all null and ``coerce`` (:issue:`9589`) +- Bug in ``convert_objects`` where ``copy`` keyword was not respected (:issue:`9589`) diff --git a/pandas/__init__.py b/pandas/__init__.py index dbc697410da80..68a90394cacf1 100644 --- a/pandas/__init__.py +++ b/pandas/__init__.py @@ -52,6 +52,7 @@ from pandas.tools.pivot import pivot_table, crosstab from pandas.tools.plotting import scatter_matrix, plot_params from pandas.tools.tile import cut, qcut +from pandas.tools.util import to_numeric from pandas.core.reshape import melt from pandas.util.print_versions import show_versions import pandas.util.testing diff --git a/pandas/core/common.py b/pandas/core/common.py index 77e58b4f56c32..2d403f904a446 100644 --- a/pandas/core/common.py +++ b/pandas/core/common.py @@ -1858,71 +1858,6 @@ def _maybe_box_datetimelike(value): _values_from_object = lib.values_from_object -def _possibly_convert_objects(values, - datetime=True, - numeric=True, - timedelta=True, - coerce=False, - copy=True): - """ if we have an object dtype, try to coerce dates and/or numbers """ - - conversion_count = sum((datetime, numeric, timedelta)) - if conversion_count == 0: - import warnings - warnings.warn('Must explicitly pass type for conversion. Defaulting to ' - 'pre-0.17 behavior where datetime=True, numeric=True, ' - 'timedelta=True and coerce=False', DeprecationWarning) - datetime = numeric = timedelta = True - coerce = False - - if isinstance(values, (list, tuple)): - # List or scalar - values = np.array(values, dtype=np.object_) - elif not hasattr(values, 'dtype'): - values = np.array([values], dtype=np.object_) - elif not is_object_dtype(values.dtype): - # If not object, do not attempt conversion - values = values.copy() if copy else values - return values - - # If 1 flag is coerce, ensure 2 others are False - if coerce: - if conversion_count > 1: - raise ValueError("Only one of 'datetime', 'numeric' or " - "'timedelta' can be True when when coerce=True.") - - # Immediate return if coerce - if datetime: - return pd.to_datetime(values, errors='coerce', box=False) - elif timedelta: - return pd.to_timedelta(values, errors='coerce', box=False) - elif numeric: - return lib.maybe_convert_numeric(values, set(), coerce_numeric=True) - - # Soft conversions - if datetime: - values = lib.maybe_convert_objects(values, - convert_datetime=datetime) - - if timedelta and is_object_dtype(values.dtype): - # Object check to ensure only run if previous did not convert - values = lib.maybe_convert_objects(values, - convert_timedelta=timedelta) - - if numeric and is_object_dtype(values.dtype): - try: - converted = lib.maybe_convert_numeric(values, - set(), - coerce_numeric=True) - # If all NaNs, then do not-alter - values = converted if not isnull(converted).all() else values - values = values.copy() if copy else values - except: - pass - - return values - - def _possibly_castable(arr): # return False to force a non-fastpath diff --git a/pandas/core/convert.py b/pandas/core/convert.py new file mode 100644 index 0000000000000..3745d4f5f6914 --- /dev/null +++ b/pandas/core/convert.py @@ -0,0 +1,132 @@ +""" +Functions for converting object to other types +""" + +import numpy as np + +import pandas as pd +from pandas.core.common import (_possibly_cast_to_datetime, is_object_dtype, + isnull) +import pandas.lib as lib + +# TODO: Remove in 0.18 or 2017, which ever is sooner +def _possibly_convert_objects(values, convert_dates=True, + convert_numeric=True, + convert_timedeltas=True, + copy=True): + """ if we have an object dtype, try to coerce dates and/or numbers """ + + # if we have passed in a list or scalar + if isinstance(values, (list, tuple)): + values = np.array(values, dtype=np.object_) + if not hasattr(values, 'dtype'): + values = np.array([values], dtype=np.object_) + + # convert dates + if convert_dates and values.dtype == np.object_: + + # we take an aggressive stance and convert to datetime64[ns] + if convert_dates == 'coerce': + new_values = _possibly_cast_to_datetime( + values, 'M8[ns]', errors='coerce') + + # if we are all nans then leave me alone + if not isnull(new_values).all(): + values = new_values + + else: + values = lib.maybe_convert_objects( + values, convert_datetime=convert_dates) + + # convert timedeltas + if convert_timedeltas and values.dtype == np.object_: + + if convert_timedeltas == 'coerce': + from pandas.tseries.timedeltas import to_timedelta + new_values = to_timedelta(values, coerce=True) + + # if we are all nans then leave me alone + if not isnull(new_values).all(): + values = new_values + + else: + values = lib.maybe_convert_objects( + values, convert_timedelta=convert_timedeltas) + + # convert to numeric + if values.dtype == np.object_: + if convert_numeric: + try: + new_values = lib.maybe_convert_numeric( + values, set(), coerce_numeric=True) + + # if we are all nans then leave me alone + if not isnull(new_values).all(): + values = new_values + + except: + pass + else: + # soft-conversion + values = lib.maybe_convert_objects(values) + + values = values.copy() if copy else values + + return values + + +def _soft_convert_objects(values, datetime=True, numeric=True, timedelta=True, + coerce=False, copy=True): + """ if we have an object dtype, try to coerce dates and/or numbers """ + + conversion_count = sum((datetime, numeric, timedelta)) + if conversion_count == 0: + raise ValueError('At least one of datetime, numeric or timedelta must ' + 'be True.') + elif conversion_count > 1 and coerce: + raise ValueError("Only one of 'datetime', 'numeric' or " + "'timedelta' can be True when when coerce=True.") + + + if isinstance(values, (list, tuple)): + # List or scalar + values = np.array(values, dtype=np.object_) + elif not hasattr(values, 'dtype'): + values = np.array([values], dtype=np.object_) + elif not is_object_dtype(values.dtype): + # If not object, do not attempt conversion + values = values.copy() if copy else values + return values + + # If 1 flag is coerce, ensure 2 others are False + if coerce: + # Immediate return if coerce + if datetime: + return pd.to_datetime(values, errors='coerce', box=False) + elif timedelta: + return pd.to_timedelta(values, errors='coerce', box=False) + elif numeric: + return pd.to_numeric(values, errors='coerce') + + # Soft conversions + if datetime: + values = lib.maybe_convert_objects(values, + convert_datetime=datetime) + + if timedelta and is_object_dtype(values.dtype): + # Object check to ensure only run if previous did not convert + values = lib.maybe_convert_objects(values, + convert_timedelta=timedelta) + + if numeric and is_object_dtype(values.dtype): + try: + converted = lib.maybe_convert_numeric(values, + set(), + coerce_numeric=True) + # If all NaNs, then do not-alter + values = converted if not isnull(converted).all() else values + values = values.copy() if copy else values + except: + pass + + return values diff --git a/pandas/core/frame.py b/pandas/core/frame.py index 9e1eda4714734..08dfe315c4cb2 100644 --- a/pandas/core/frame.py +++ b/pandas/core/frame.py @@ -3543,9 +3543,8 @@ def combine(self, other, func, fill_value=None, overwrite=True): # convert_objects just in case return self._constructor(result, index=new_index, - columns=new_columns).convert_objects( - datetime=True, - copy=False) + columns=new_columns)._convert(datetime=True, + copy=False) def combine_first(self, other): """ @@ -4026,9 +4025,7 @@ def _apply_standard(self, func, axis, ignore_failures=False, reduce=True): if axis == 1: result = result.T - result = result.convert_objects(datetime=True, - timedelta=True, - copy=False) + result = result._convert(datetime=True, timedelta=True, copy=False) else: @@ -4158,7 +4155,7 @@ def append(self, other, ignore_index=False, verify_integrity=False): other = DataFrame(other.values.reshape((1, len(other))), index=index, columns=combined_columns) - other = other.convert_objects(datetime=True, timedelta=True) + other = other._convert(datetime=True, timedelta=True) if not self.columns.equals(combined_columns): self = self.reindex(columns=combined_columns) diff --git a/pandas/core/generic.py b/pandas/core/generic.py index 6aec297c31d2b..3473dd0f7cd88 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -2534,11 +2534,8 @@ def copy(self, deep=True): data = self._data.copy(deep=deep) return self._constructor(data).__finalize__(self) - @deprecate_kwarg(old_arg_name='convert_dates', new_arg_name='datetime') - @deprecate_kwarg(old_arg_name='convert_numeric', new_arg_name='numeric') - @deprecate_kwarg(old_arg_name='convert_timedeltas', new_arg_name='timedelta') - def convert_objects(self, datetime=False, numeric=False, - timedelta=False, coerce=False, copy=True): + def _convert(self, datetime=False, numeric=False, timedelta=False, + coerce=False, copy=True): """ Attempt to infer better dtype for object columns @@ -2563,31 +2560,48 @@ def convert_objects(self, datetime=False, numeric=False, ------- converted : same as input object """ + return self._constructor( + self._data.convert(datetime=datetime, + numeric=numeric, + timedelta=timedelta, + coerce=coerce, + copy=copy)).__finalize__(self) + + # TODO: Remove in 0.18 or 2017, which ever is sooner + def convert_objects(self, convert_dates=True, convert_numeric=False, + convert_timedeltas=True, copy=True): + """ + Attempt to infer better dtype for object columns + + Parameters + ---------- + convert_dates : boolean, default True + If True, convert to date where possible. If 'coerce', force + conversion, with unconvertible values becoming NaT. + convert_numeric : boolean, default False + If True, attempt to coerce to numbers (including strings), with + unconvertible values becoming NaN. + convert_timedeltas : boolean, default True + If True, convert to timedelta where possible. If 'coerce', force + conversion, with unconvertible values becoming NaT. + copy : boolean, default True + If True, return a copy even if no copy is necessary (e.g. no + conversion was done). Note: This is meant for internal use, and + should not be confused with inplace. - # Deprecation code to handle usage change - issue_warning = False - if datetime == 'coerce': - datetime = coerce = True - numeric = timedelta = False - issue_warning = True - elif numeric == 'coerce': - numeric = coerce = True - datetime = timedelta = False - issue_warning = True - elif timedelta == 'coerce': - timedelta = coerce = True - datetime = numeric = False - issue_warning = True - if issue_warning: - warnings.warn("The use of 'coerce' as an input is deprecated. " - "Instead set coerce=True.", - FutureWarning) + Returns + ------- + converted : same as input object + """ + from warnings import warn + warn("convert_objects is deprecated. Use the data-type specific " + "converters pd.to_datetime, pd.to_timestamp and pd.to_numeric.", + FutureWarning, stacklevel=2) return self._constructor( - self._data.convert(datetime=datetime, - numeric=numeric, - timedelta=timedelta, - coerce=coerce, + self._data.convert(convert_dates=convert_dates, + convert_numeric=convert_numeric, + convert_timedeltas=convert_timedeltas, copy=copy)).__finalize__(self) #---------------------------------------------------------------------- diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py index e837445e9e348..40f078a1bbcfe 100644 --- a/pandas/core/groupby.py +++ b/pandas/core/groupby.py @@ -112,7 +112,7 @@ def f(self): except Exception: result = self.aggregate(lambda x: npfunc(x, axis=self.axis)) if _convert: - result = result.convert_objects(datetime=True) + result = result._convert(datetime=True) return result f.__doc__ = "Compute %s of group values" % name @@ -2882,7 +2882,7 @@ def aggregate(self, arg, *args, **kwargs): self._insert_inaxis_grouper_inplace(result) result.index = np.arange(len(result)) - return result.convert_objects(datetime=True) + return result._convert(datetime=True) def _aggregate_multiple_funcs(self, arg): from pandas.tools.merge import concat @@ -3123,14 +3123,14 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False): # as we are stacking can easily have object dtypes here if (self._selected_obj.ndim == 2 and self._selected_obj.dtypes.isin(_DATELIKE_DTYPES).any()): - result = result.convert_objects(numeric=True) + result = result._convert(numeric=True) date_cols = self._selected_obj.select_dtypes( include=list(_DATELIKE_DTYPES)).columns result[date_cols] = (result[date_cols] - .convert_objects(datetime=True, + ._convert(datetime=True, coerce=True)) else: - result = result.convert_objects(datetime=True) + result = result._convert(datetime=True) return self._reindex_output(result) @@ -3138,7 +3138,7 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False): # only coerce dates if we find at least 1 datetime coerce = True if any([ isinstance(v,Timestamp) for v in values ]) else False return (Series(values, index=key_index) - .convert_objects(datetime=True, + ._convert(datetime=True, coerce=coerce)) else: @@ -3243,7 +3243,7 @@ def transform(self, func, *args, **kwargs): results = self._try_cast(results, obj[result.columns]) return (DataFrame(results,columns=result.columns,index=obj.index) - .convert_objects(datetime=True)) + ._convert(datetime=True)) def _define_paths(self, func, *args, **kwargs): if isinstance(func, compat.string_types): @@ -3436,7 +3436,7 @@ def _wrap_aggregated_output(self, output, names=None): if self.axis == 1: result = result.T - return self._reindex_output(result).convert_objects(datetime=True) + return self._reindex_output(result)._convert(datetime=True) def _wrap_agged_blocks(self, items, blocks): if not self.as_index: @@ -3454,7 +3454,7 @@ def _wrap_agged_blocks(self, items, blocks): if self.axis == 1: result = result.T - return self._reindex_output(result).convert_objects(datetime=True) + return self._reindex_output(result)._convert(datetime=True) def _reindex_output(self, result): """ diff --git a/pandas/core/internals.py b/pandas/core/internals.py index 97b54d4ef6ebe..4790f3aa3841e 100644 --- a/pandas/core/internals.py +++ b/pandas/core/internals.py @@ -25,6 +25,7 @@ from pandas.core.categorical import Categorical, maybe_to_categorical from pandas.tseries.index import DatetimeIndex import pandas.core.common as com +import pandas.core.convert as convert from pandas.sparse.array import _maybe_to_sparse, SparseArray import pandas.lib as lib import pandas.tslib as tslib @@ -1517,14 +1518,35 @@ def is_bool(self): """ return lib.is_bool_array(self.values.ravel()) - def convert(self, datetime=True, numeric=True, timedelta=True, coerce=False, - copy=True, by_item=True): + # TODO: Refactor when convert_objects is removed since there will be 1 path + def convert(self, *args, **kwargs): """ attempt to coerce any object types to better types return a copy of the block (if copy = True) by definition we ARE an ObjectBlock!!!!! can return multiple blocks! """ + if args: + raise NotImplementedError + by_item = True if 'by_item' not in kwargs else kwargs['by_item'] + + new_inputs = ['coerce','datetime','numeric','timedelta'] + new_style = False + for kw in new_inputs: + new_style |= kw in kwargs + + if new_style: + fn = convert._soft_convert_objects + fn_inputs = new_inputs + else: + fn = convert._possibly_convert_objects + fn_inputs = ['convert_dates','convert_numeric','convert_timedeltas'] + fn_inputs += ['copy'] + + fn_kwargs = {} + for key in fn_inputs: + if key in kwargs: + fn_kwargs[key] = kwargs[key] # attempt to create new type blocks blocks = [] @@ -1533,30 +1555,14 @@ def convert(self, datetime=True, numeric=True, timedelta=True, coerce=False, for i, rl in enumerate(self.mgr_locs): values = self.iget(i) - values = com._possibly_convert_objects( - values.ravel(), - datetime=datetime, - numeric=numeric, - timedelta=timedelta, - coerce=coerce, - copy=copy - ).reshape(values.shape) + values = fn(values.ravel(), **fn_kwargs).reshape(values.shape) values = _block_shape(values, ndim=self.ndim) - newb = self.make_block(values, - placement=[rl]) + newb = make_block(values, ndim=self.ndim, placement=[rl]) blocks.append(newb) else: - - values = com._possibly_convert_objects( - self.values.ravel(), - datetime=datetime, - numeric=numeric, - timedelta=timedelta, - coerce=coerce, - copy=copy - ).reshape(self.values.shape) - blocks.append(self.make_block(values)) + values = fn(self.values.ravel(), **fn_kwargs).reshape(self.values.shape) + blocks.append(make_block(values, ndim=self.ndim, placement=self.mgr_locs)) return blocks @@ -1597,8 +1603,7 @@ def _maybe_downcast(self, blocks, downcast=None): # split and convert the blocks result_blocks = [] for blk in blocks: - result_blocks.extend(blk.convert(datetime=True, - numeric=False)) + result_blocks.extend(blk.convert(datetime=True, numeric=False)) return result_blocks def _can_hold_element(self, element): diff --git a/pandas/io/tests/test_html.py b/pandas/io/tests/test_html.py index 5c8c15c7c2ae0..141533a131e42 100644 --- a/pandas/io/tests/test_html.py +++ b/pandas/io/tests/test_html.py @@ -527,10 +527,10 @@ def try_remove_ws(x): 'Hamilton Bank, NA', 'The Citizens Savings Bank'] dfnew = df.applymap(try_remove_ws).replace(old, new) gtnew = ground_truth.applymap(try_remove_ws) - converted = dfnew.convert_objects(datetime=True, numeric=True) + converted = dfnew._convert(datetime=True, numeric=True) date_cols = ['Closing Date','Updated Date'] - converted[date_cols] = converted[date_cols].convert_objects(datetime=True, - coerce=True) + converted[date_cols] = converted[date_cols]._convert(datetime=True, + coerce=True) tm.assert_frame_equal(converted,gtnew) @slow diff --git a/pandas/io/tests/test_pytables.py b/pandas/io/tests/test_pytables.py index adef470965f21..df2a659100305 100644 --- a/pandas/io/tests/test_pytables.py +++ b/pandas/io/tests/test_pytables.py @@ -408,7 +408,7 @@ def test_repr(self): df['datetime1'] = datetime.datetime(2001,1,2,0,0) df['datetime2'] = datetime.datetime(2001,1,3,0,0) df.ix[3:6,['obj1']] = np.nan - df = df.consolidate().convert_objects(datetime=True) + df = df.consolidate()._convert(datetime=True) warnings.filterwarnings('ignore', category=PerformanceWarning) store['df'] = df @@ -736,7 +736,7 @@ def test_put_mixed_type(self): df['datetime1'] = datetime.datetime(2001, 1, 2, 0, 0) df['datetime2'] = datetime.datetime(2001, 1, 3, 0, 0) df.ix[3:6, ['obj1']] = np.nan - df = df.consolidate().convert_objects(datetime=True) + df = df.consolidate()._convert(datetime=True) with ensure_clean_store(self.path) as store: _maybe_remove(store, 'df') @@ -1456,7 +1456,7 @@ def check_col(key,name,size): df_dc.ix[7:9, 'string'] = 'bar' df_dc['string2'] = 'cool' df_dc['datetime'] = Timestamp('20010102') - df_dc = df_dc.convert_objects(datetime=True) + df_dc = df_dc._convert(datetime=True) df_dc.ix[3:5, ['A', 'B', 'datetime']] = np.nan _maybe_remove(store, 'df_dc') @@ -1918,7 +1918,7 @@ def test_table_mixed_dtypes(self): df['datetime1'] = datetime.datetime(2001, 1, 2, 0, 0) df['datetime2'] = datetime.datetime(2001, 1, 3, 0, 0) df.ix[3:6, ['obj1']] = np.nan - df = df.consolidate().convert_objects(datetime=True) + df = df.consolidate()._convert(datetime=True) with ensure_clean_store(self.path) as store: store.append('df1_mixed', df) @@ -1974,7 +1974,7 @@ def test_unimplemented_dtypes_table_columns(self): df['obj1'] = 'foo' df['obj2'] = 'bar' df['datetime1'] = datetime.date(2001, 1, 2) - df = df.consolidate().convert_objects(datetime=True) + df = df.consolidate()._convert(datetime=True) with ensure_clean_store(self.path) as store: # this fails because we have a date in the object block...... diff --git a/pandas/io/tests/test_stata.py b/pandas/io/tests/test_stata.py index 8505150932c90..aff9cd6c558e2 100644 --- a/pandas/io/tests/test_stata.py +++ b/pandas/io/tests/test_stata.py @@ -417,7 +417,7 @@ def test_read_write_reread_dta14(self): expected = self.read_csv(self.csv14) cols = ['byte_', 'int_', 'long_', 'float_', 'double_'] for col in cols: - expected[col] = expected[col].convert_objects(datetime=True, numeric=True) + expected[col] = expected[col]._convert(datetime=True, numeric=True) expected['float_'] = expected['float_'].astype(np.float32) expected['date_td'] = pd.to_datetime(expected['date_td'], errors='coerce') diff --git a/pandas/io/wb.py b/pandas/io/wb.py index 99b14be0b0b6b..e617a01b73bfd 100644 --- a/pandas/io/wb.py +++ b/pandas/io/wb.py @@ -165,7 +165,7 @@ def download(country=['MX', 'CA', 'US'], indicator=['NY.GDP.MKTP.CD', 'NY.GNS.IC out = reduce(lambda x, y: x.merge(y, how='outer'), data) out = out.drop('iso_code', axis=1) out = out.set_index(['country', 'year']) - out = out.convert_objects(datetime=True, numeric=True) + out = out._convert(datetime=True, numeric=True) return out else: msg = "No indicators returned data." diff --git a/pandas/tests/test_common.py b/pandas/tests/test_common.py index b234773359f8c..c488d22da7dfe 100644 --- a/pandas/tests/test_common.py +++ b/pandas/tests/test_common.py @@ -13,6 +13,7 @@ from pandas.compat import range, long, lrange, lmap, u from pandas.core.common import notnull, isnull, array_equivalent import pandas.core.common as com +import pandas.core.convert as convert import pandas.util.testing as tm import pandas.core.config as cf @@ -1051,33 +1052,32 @@ def test_maybe_convert_string_to_array(self): tm.assert_numpy_array_equal(result, np.array(['x', 2], dtype=object)) self.assertTrue(result.dtype == object) - -def test_dict_compat(): - data_datetime64 = {np.datetime64('1990-03-15'): 1, - np.datetime64('2015-03-15'): 2} - data_unchanged = {1: 2, 3: 4, 5: 6} - expected = {Timestamp('1990-3-15'): 1, Timestamp('2015-03-15'): 2} - assert(com._dict_compat(data_datetime64) == expected) - assert(com._dict_compat(expected) == expected) - assert(com._dict_compat(data_unchanged) == data_unchanged) - - def test_possibly_convert_objects_copy(): values = np.array([1, 2]) - out = com._possibly_convert_objects(values, copy=False) + out = convert._possibly_convert_objects(values, copy=False) assert_true(values is out) - out = com._possibly_convert_objects(values, copy=True) + out = convert._possibly_convert_objects(values, copy=True) assert_true(values is not out) values = np.array(['apply','banana']) - out = com._possibly_convert_objects(values, copy=False) + out = convert._possibly_convert_objects(values, copy=False) assert_true(values is out) - out = com._possibly_convert_objects(values, copy=True) + out = convert._possibly_convert_objects(values, copy=True) assert_true(values is not out) - + + +def test_dict_compat(): + data_datetime64 = {np.datetime64('1990-03-15'): 1, + np.datetime64('2015-03-15'): 2} + data_unchanged = {1: 2, 3: 4, 5: 6} + expected = {Timestamp('1990-3-15'): 1, Timestamp('2015-03-15'): 2} + assert(com._dict_compat(data_datetime64) == expected) + assert(com._dict_compat(expected) == expected) + assert(com._dict_compat(data_unchanged) == data_unchanged) + if __name__ == '__main__': nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], diff --git a/pandas/tests/test_frame.py b/pandas/tests/test_frame.py index c963222cf4ad9..5acc858840dfa 100644 --- a/pandas/tests/test_frame.py +++ b/pandas/tests/test_frame.py @@ -6805,7 +6805,7 @@ def make_dtnat_arr(n,nnat=None): with ensure_clean('.csv') as pth: df=DataFrame(dict(a=s1,b=s2)) df.to_csv(pth,chunksize=chunksize) - recons = DataFrame.from_csv(pth).convert_objects(datetime=True, + recons = DataFrame.from_csv(pth)._convert(datetime=True, coerce=True) assert_frame_equal(df, recons,check_names=False,check_less_precise=True) @@ -7516,7 +7516,7 @@ def test_dtypes(self): def test_convert_objects(self): oops = self.mixed_frame.T.T - converted = oops.convert_objects(datetime=True) + converted = oops._convert(datetime=True) assert_frame_equal(converted, self.mixed_frame) self.assertEqual(converted['A'].dtype, np.float64) @@ -7529,8 +7529,7 @@ def test_convert_objects(self): self.mixed_frame['J'] = '1.' self.mixed_frame['K'] = '1' self.mixed_frame.ix[0:5,['J','K']] = 'garbled' - converted = self.mixed_frame.convert_objects(datetime=True, - numeric=True) + converted = self.mixed_frame._convert(datetime=True, numeric=True) self.assertEqual(converted['H'].dtype, 'float64') self.assertEqual(converted['I'].dtype, 'int64') self.assertEqual(converted['J'].dtype, 'float64') @@ -7552,14 +7551,14 @@ def test_convert_objects(self): # mixed in a single column df = DataFrame(dict(s = Series([1, 'na', 3 ,4]))) - result = df.convert_objects(datetime=True, numeric=True) + result = df._convert(datetime=True, numeric=True) expected = DataFrame(dict(s = Series([1, np.nan, 3 ,4]))) assert_frame_equal(result, expected) def test_convert_objects_no_conversion(self): mixed1 = DataFrame( {'a': [1, 2, 3], 'b': [4.0, 5, 6], 'c': ['x', 'y', 'z']}) - mixed2 = mixed1.convert_objects(datetime=True) + mixed2 = mixed1._convert(datetime=True) assert_frame_equal(mixed1, mixed2) def test_append_series_dict(self): @@ -11551,7 +11550,7 @@ def test_apply_convert_objects(self): 'F': np.random.randn(11)}) result = data.apply(lambda x: x, axis=1) - assert_frame_equal(result.convert_objects(datetime=True), data) + assert_frame_equal(result._convert(datetime=True), data) def test_apply_attach_name(self): result = self.frame.apply(lambda x: x.name) diff --git a/pandas/tests/test_indexing.py b/pandas/tests/test_indexing.py index 1e707264edebe..35467c6abb9b4 100644 --- a/pandas/tests/test_indexing.py +++ b/pandas/tests/test_indexing.py @@ -3138,8 +3138,7 @@ def test_astype_assignment(self): assert_frame_equal(df,expected) df = df_orig.copy() - df.iloc[:,0:2] = df.iloc[:,0:2].convert_objects(datetime=True, - numeric=True) + df.iloc[:,0:2] = df.iloc[:,0:2]._convert(datetime=True, numeric=True) expected = DataFrame([[1,2,'3','.4',5,6.,'foo']],columns=list('ABCDEFG')) assert_frame_equal(df,expected) diff --git a/pandas/tests/test_panel.py b/pandas/tests/test_panel.py index bd27d11ef14c1..0dad55a9133b6 100644 --- a/pandas/tests/test_panel.py +++ b/pandas/tests/test_panel.py @@ -1145,7 +1145,7 @@ def test_convert_objects(self): # GH 4937 p = Panel(dict(A = dict(a = ['1','1.0']))) expected = Panel(dict(A = dict(a = [1,1.0]))) - result = p.convert_objects(numeric=True, coerce=True) + result = p._convert(numeric=True, coerce=True) assert_panel_equal(result, expected) def test_dtypes(self): diff --git a/pandas/tests/test_series.py b/pandas/tests/test_series.py index a6d7e63656d68..79de22b507e2a 100644 --- a/pandas/tests/test_series.py +++ b/pandas/tests/test_series.py @@ -6449,21 +6449,138 @@ def test_apply_dont_convert_dtype(self): result = s.apply(f, convert_dtype=False) self.assertEqual(result.dtype, object) - # GH 10265 def test_convert_objects(self): + + s = Series([1., 2, 3], index=['a', 'b', 'c']) + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates=False, convert_numeric=True) + assert_series_equal(result, s) + + # force numeric conversion + r = s.copy().astype('O') + r['a'] = '1' + with tm.assert_produces_warning(FutureWarning): + result = r.convert_objects(convert_dates=False, convert_numeric=True) + assert_series_equal(result, s) + + r = s.copy().astype('O') + r['a'] = '1.' + with tm.assert_produces_warning(FutureWarning): + result = r.convert_objects(convert_dates=False, convert_numeric=True) + assert_series_equal(result, s) + + r = s.copy().astype('O') + r['a'] = 'garbled' + expected = s.copy() + expected['a'] = np.nan + with tm.assert_produces_warning(FutureWarning): + result = r.convert_objects(convert_dates=False, convert_numeric=True) + assert_series_equal(result, expected) + + # GH 4119, not converting a mixed type (e.g.floats and object) + s = Series([1, 'na', 3, 4]) + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_numeric=True) + expected = Series([1, np.nan, 3, 4]) + assert_series_equal(result, expected) + + s = Series([1, '', 3, 4]) + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_numeric=True) + expected = Series([1, np.nan, 3, 4]) + assert_series_equal(result, expected) + + # dates + s = Series( + [datetime(2001, 1, 1, 0, 0), datetime(2001, 1, 2, 0, 0), datetime(2001, 1, 3, 0, 0)]) + s2 = Series([datetime(2001, 1, 1, 0, 0), datetime(2001, 1, 2, 0, 0), datetime( + 2001, 1, 3, 0, 0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'], dtype='O') + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates=True, convert_numeric=False) + expected = Series( + [Timestamp('20010101'), Timestamp('20010102'), Timestamp('20010103')], dtype='M8[ns]') + assert_series_equal(result, expected) + + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates='coerce', + convert_numeric=False) + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates='coerce', + convert_numeric=True) + assert_series_equal(result, expected) + + expected = Series( + [Timestamp( + '20010101'), Timestamp('20010102'), Timestamp('20010103'), + lib.NaT, lib.NaT, lib.NaT, Timestamp('20010104'), Timestamp('20010105')], dtype='M8[ns]') + with tm.assert_produces_warning(FutureWarning): + result = s2.convert_objects(convert_dates='coerce', + convert_numeric=False) + assert_series_equal(result, expected) + with tm.assert_produces_warning(FutureWarning): + result = s2.convert_objects(convert_dates='coerce', + convert_numeric=True) + assert_series_equal(result, expected) + + # preserver all-nans (if convert_dates='coerce') + s = Series(['foo', 'bar', 1, 1.0], dtype='O') + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates='coerce', + convert_numeric=False) + assert_series_equal(result, s) + + # preserver if non-object + s = Series([1], dtype='float32') + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates='coerce', + convert_numeric=False) + assert_series_equal(result, s) + + #r = s.copy() + #r[0] = np.nan + #result = r.convert_objects(convert_dates=True,convert_numeric=False) + #self.assertEqual(result.dtype, 'M8[ns]') + + # dateutil parses some single letters into today's value as a date + for x in 'abcdefghijklmnopqrstuvwxyz': + s = Series([x]) + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates='coerce') + assert_series_equal(result, s) + s = Series([x.upper()]) + with tm.assert_produces_warning(FutureWarning): + result = s.convert_objects(convert_dates='coerce') + assert_series_equal(result, s) + + def test_convert_objects_preserve_bool(self): + s = Series([1, True, 3, 5], dtype=object) + with tm.assert_produces_warning(FutureWarning): + r = s.convert_objects(convert_numeric=True) + e = Series([1, 1, 3, 5], dtype='i8') + tm.assert_series_equal(r, e) + + def test_convert_objects_preserve_all_bool(self): + s = Series([False, True, False, False], dtype=object) + with tm.assert_produces_warning(FutureWarning): + r = s.convert_objects(convert_numeric=True) + e = Series([False, True, False, False], dtype=bool) + tm.assert_series_equal(r, e) + + # GH 10265 + def test_convert(self): # Tests: All to nans, coerce, true # Test coercion returns correct type s = Series(['a', 'b', 'c']) - results = s.convert_objects(datetime=True, coerce=True) + results = s._convert(datetime=True, coerce=True) expected = Series([lib.NaT] * 3) assert_series_equal(results, expected) - results = s.convert_objects(numeric=True, coerce=True) + results = s._convert(numeric=True, coerce=True) expected = Series([np.nan] * 3) assert_series_equal(results, expected) expected = Series([lib.NaT] * 3, dtype=np.dtype('m8[ns]')) - results = s.convert_objects(timedelta=True, coerce=True) + results = s._convert(timedelta=True, coerce=True) assert_series_equal(results, expected) dt = datetime(2001, 1, 1, 0, 0) @@ -6471,83 +6588,83 @@ def test_convert_objects(self): # Test coercion with mixed types s = Series(['a', '3.1415', dt, td]) - results = s.convert_objects(datetime=True, coerce=True) + results = s._convert(datetime=True, coerce=True) expected = Series([lib.NaT, lib.NaT, dt, lib.NaT]) assert_series_equal(results, expected) - results = s.convert_objects(numeric=True, coerce=True) + results = s._convert(numeric=True, coerce=True) expected = Series([nan, 3.1415, nan, nan]) assert_series_equal(results, expected) - results = s.convert_objects(timedelta=True, coerce=True) + results = s._convert(timedelta=True, coerce=True) expected = Series([lib.NaT, lib.NaT, lib.NaT, td], dtype=np.dtype('m8[ns]')) assert_series_equal(results, expected) # Test standard conversion returns original - results = s.convert_objects(datetime=True) + results = s._convert(datetime=True) assert_series_equal(results, s) - results = s.convert_objects(numeric=True) + results = s._convert(numeric=True) expected = Series([nan, 3.1415, nan, nan]) assert_series_equal(results, expected) - results = s.convert_objects(timedelta=True) + results = s._convert(timedelta=True) assert_series_equal(results, s) # test pass-through and non-conversion when other types selected s = Series(['1.0','2.0','3.0']) - results = s.convert_objects(datetime=True, numeric=True, timedelta=True) + results = s._convert(datetime=True, numeric=True, timedelta=True) expected = Series([1.0,2.0,3.0]) assert_series_equal(results, expected) - results = s.convert_objects(True,False,True) + results = s._convert(True,False,True) assert_series_equal(results, s) s = Series([datetime(2001, 1, 1, 0, 0),datetime(2001, 1, 1, 0, 0)], dtype='O') - results = s.convert_objects(datetime=True, numeric=True, timedelta=True) + results = s._convert(datetime=True, numeric=True, timedelta=True) expected = Series([datetime(2001, 1, 1, 0, 0),datetime(2001, 1, 1, 0, 0)]) assert_series_equal(results, expected) - results = s.convert_objects(datetime=False,numeric=True,timedelta=True) + results = s._convert(datetime=False,numeric=True,timedelta=True) assert_series_equal(results, s) td = datetime(2001, 1, 1, 0, 0) - datetime(2000, 1, 1, 0, 0) s = Series([td, td], dtype='O') - results = s.convert_objects(datetime=True, numeric=True, timedelta=True) + results = s._convert(datetime=True, numeric=True, timedelta=True) expected = Series([td, td]) assert_series_equal(results, expected) - results = s.convert_objects(True,True,False) + results = s._convert(True,True,False) assert_series_equal(results, s) s = Series([1., 2, 3], index=['a', 'b', 'c']) - result = s.convert_objects(numeric=True) + result = s._convert(numeric=True) assert_series_equal(result, s) # force numeric conversion r = s.copy().astype('O') r['a'] = '1' - result = r.convert_objects(numeric=True) + result = r._convert(numeric=True) assert_series_equal(result, s) r = s.copy().astype('O') r['a'] = '1.' - result = r.convert_objects(numeric=True) + result = r._convert(numeric=True) assert_series_equal(result, s) r = s.copy().astype('O') r['a'] = 'garbled' - result = r.convert_objects(numeric=True) + result = r._convert(numeric=True) expected = s.copy() expected['a'] = nan assert_series_equal(result, expected) # GH 4119, not converting a mixed type (e.g.floats and object) s = Series([1, 'na', 3, 4]) - result = s.convert_objects(datetime=True, numeric=True) + result = s._convert(datetime=True, numeric=True) expected = Series([1, nan, 3, 4]) assert_series_equal(result, expected) s = Series([1, '', 3, 4]) - result = s.convert_objects(datetime=True, numeric=True) + result = s._convert(datetime=True, numeric=True) assert_series_equal(result, expected) # dates @@ -6556,95 +6673,64 @@ def test_convert_objects(self): s2 = Series([datetime(2001, 1, 1, 0, 0), datetime(2001, 1, 2, 0, 0), datetime( 2001, 1, 3, 0, 0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'], dtype='O') - result = s.convert_objects(datetime=True) + result = s._convert(datetime=True) expected = Series( [Timestamp('20010101'), Timestamp('20010102'), Timestamp('20010103')], dtype='M8[ns]') assert_series_equal(result, expected) - result = s.convert_objects(datetime=True, coerce=True) + result = s._convert(datetime=True, coerce=True) assert_series_equal(result, expected) expected = Series( [Timestamp( '20010101'), Timestamp('20010102'), Timestamp('20010103'), lib.NaT, lib.NaT, lib.NaT, Timestamp('20010104'), Timestamp('20010105')], dtype='M8[ns]') - result = s2.convert_objects(datetime=True, + result = s2._convert(datetime=True, numeric=False, timedelta=False, coerce=True) assert_series_equal(result, expected) - result = s2.convert_objects(datetime=True, coerce=True) + result = s2._convert(datetime=True, coerce=True) assert_series_equal(result, expected) s = Series(['foo', 'bar', 1, 1.0], dtype='O') - result = s.convert_objects(datetime=True, coerce=True) + result = s._convert(datetime=True, coerce=True) expected = Series([lib.NaT]*4) assert_series_equal(result, expected) # preserver if non-object s = Series([1], dtype='float32') - result = s.convert_objects(datetime=True, coerce=True) + result = s._convert(datetime=True, coerce=True) assert_series_equal(result, s) #r = s.copy() #r[0] = np.nan - #result = r.convert_objects(convert_dates=True,convert_numeric=False) + #result = r._convert(convert_dates=True,convert_numeric=False) #self.assertEqual(result.dtype, 'M8[ns]') # dateutil parses some single letters into today's value as a date expected = Series([lib.NaT]) for x in 'abcdefghijklmnopqrstuvwxyz': s = Series([x]) - result = s.convert_objects(datetime=True, coerce=True) + result = s._convert(datetime=True, coerce=True) assert_series_equal(result, expected) s = Series([x.upper()]) - result = s.convert_objects(datetime=True, coerce=True) + result = s._convert(datetime=True, coerce=True) assert_series_equal(result, expected) - # GH 10601 - # Remove test after deprecation to convert_objects is final - def test_convert_objects_old_style_deprecation(self): - s = Series(['foo', 'bar', 1, 1.0], dtype='O') - with warnings.catch_warnings(record=True) as w: - warnings.simplefilter('always', FutureWarning) - new_style = s.convert_objects(datetime=True, coerce=True) - old_style = s.convert_objects(convert_dates='coerce') - self.assertEqual(len(w), 2) - assert_series_equal(new_style, old_style) - - with warnings.catch_warnings(record=True) as w: - warnings.simplefilter('always', FutureWarning) - new_style = s.convert_objects(numeric=True, coerce=True) - old_style = s.convert_objects(convert_numeric='coerce') - self.assertEqual(len(w), 2) - assert_series_equal(new_style, old_style) - - dt = datetime(2001, 1, 1, 0, 0) - td = dt - datetime(2000, 1, 1, 0, 0) - s = Series(['a', '3.1415', dt, td]) - with warnings.catch_warnings(record=True) as w: - warnings.simplefilter('always', FutureWarning) - new_style = s.convert_objects(timedelta=True, coerce=True) - old_style = s.convert_objects(convert_timedeltas='coerce') - self.assertEqual(len(w), 2) - assert_series_equal(new_style, old_style) - - def test_convert_objects_no_arg_warning(self): + def test_convert_no_arg_error(self): s = Series(['1.0','2']) - with warnings.catch_warnings(record=True) as w: - warnings.simplefilter('always', DeprecationWarning) - s.convert_objects() - self.assertEqual(len(w), 1) + self.assertRaises(ValueError, s._convert) - def test_convert_objects_preserve_bool(self): + def test_convert_preserve_bool(self): s = Series([1, True, 3, 5], dtype=object) - r = s.convert_objects(datetime=True, numeric=True) + r = s._convert(datetime=True, numeric=True) e = Series([1, 1, 3, 5], dtype='i8') tm.assert_series_equal(r, e) - def test_convert_objects_preserve_all_bool(self): + def test_convert_preserve_all_bool(self): s = Series([False, True, False, False], dtype=object) - r = s.convert_objects(datetime=True, numeric=True) + r = s._convert(datetime=True, numeric=True) e = Series([False, True, False, False], dtype=bool) tm.assert_series_equal(r, e) diff --git a/pandas/tests/test_util.py b/pandas/tests/test_util.py index fb334cf9912f3..427c96a839c26 100644 --- a/pandas/tests/test_util.py +++ b/pandas/tests/test_util.py @@ -1,13 +1,11 @@ # -*- coding: utf-8 -*- -import warnings - import nose -import sys -import pandas.util from pandas.util.decorators import deprecate_kwarg import pandas.util.testing as tm + + class TestDecorators(tm.TestCase): def setUp(self): @deprecate_kwarg('old', 'new') @@ -75,7 +73,6 @@ def test_rands_array(): assert(arr.shape == (10, 10)) assert(len(arr[1, 1]) == 7) - if __name__ == '__main__': nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], exit=False) diff --git a/pandas/tools/plotting.py b/pandas/tools/plotting.py index 55464a7f1d23e..98d6f5e8eb797 100644 --- a/pandas/tools/plotting.py +++ b/pandas/tools/plotting.py @@ -1079,7 +1079,7 @@ def _compute_plot_data(self): label = 'None' data = data.to_frame(name=label) - numeric_data = data.convert_objects(datetime=True)._get_numeric_data() + numeric_data = data._convert(datetime=True)._get_numeric_data() try: is_empty = numeric_data.empty @@ -1972,8 +1972,7 @@ def __init__(self, data, bins=10, bottom=0, **kwargs): def _args_adjust(self): if com.is_integer(self.bins): # create common bin edge - values = (self.data.convert_objects(datetime=True) - ._get_numeric_data()) + values = (self.data._convert(datetime=True)._get_numeric_data()) values = np.ravel(values) values = values[~com.isnull(values)] diff --git a/pandas/tools/tests/test_util.py b/pandas/tools/tests/test_util.py index 1adf47e946a96..72ce7d8659157 100644 --- a/pandas/tools/tests/test_util.py +++ b/pandas/tools/tests/test_util.py @@ -2,14 +2,15 @@ import locale import codecs import nose +from nose.tools import assert_raises, assert_true import numpy as np from numpy.testing import assert_equal +import pandas as pd from pandas import date_range, Index import pandas.util.testing as tm -from pandas.tools.util import cartesian_product - +from pandas.tools.util import cartesian_product, to_numeric CURRENT_LOCALE = locale.getlocale() LOCALE_OVERRIDE = os.environ.get('LOCALE_OVERRIDE', None) @@ -89,6 +90,54 @@ def test_set_locale(self): self.assertEqual(current_locale, CURRENT_LOCALE) +class TestToNumeric(tm.TestCase): + def test_series(self): + s = pd.Series(['1', '-3.14', '7']) + res = to_numeric(s) + expected = pd.Series([1, -3.14, 7]) + tm.assert_series_equal(res, expected) + + s = pd.Series(['1', '-3.14', 7]) + res = to_numeric(s) + tm.assert_series_equal(res, expected) + + def test_error(self): + s = pd.Series([1, -3.14, 'apple']) + assert_raises(ValueError, to_numeric, s, errors='raise') + + res = to_numeric(s, errors='ignore') + expected = pd.Series([1, -3.14, 'apple']) + tm.assert_series_equal(res, expected) + + res = to_numeric(s, errors='coerce') + expected = pd.Series([1, -3.14, np.nan]) + tm.assert_series_equal(res, expected) + + + def test_list(self): + s = ['1', '-3.14', '7'] + res = to_numeric(s) + expected = np.array([1, -3.14, 7]) + tm.assert_numpy_array_equal(res, expected) + + def test_numeric(self): + s = pd.Series([1, -3.14, 7], dtype='O') + res = to_numeric(s) + expected = pd.Series([1, -3.14, 7]) + tm.assert_series_equal(res, expected) + + s = pd.Series([1, -3.14, 7]) + res = to_numeric(s) + tm.assert_series_equal(res, expected) + + def test_all_nan(self): + s = pd.Series(['a','b','c']) + res = to_numeric(s, errors='coerce') + expected = pd.Series([np.nan, np.nan, np.nan]) + tm.assert_series_equal(res, expected) + + if __name__ == '__main__': nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], exit=False) + diff --git a/pandas/tools/util.py b/pandas/tools/util.py index 0bb6b4b7f7892..925c23255b5f5 100644 --- a/pandas/tools/util.py +++ b/pandas/tools/util.py @@ -1,9 +1,9 @@ -import operator -import warnings +import numpy as np +import pandas.lib as lib + +import pandas as pd from pandas.compat import reduce from pandas.core.index import Index -import numpy as np -from pandas import algos from pandas.core import common as com @@ -48,3 +48,57 @@ def compose(*funcs): """Compose 2 or more callables""" assert len(funcs) > 1, 'At least 2 callables must be passed to compose' return reduce(_compose2, funcs) + + +def to_numeric(arg, errors='raise'): + """ + Convert argument to a numeric type. + + Parameters + ---------- + arg : list, tuple or array of objects, or Series + errors : {'ignore', 'raise', 'coerce'}, default 'raise' + - If 'raise', then invalid parsing will raise an exception + - If 'coerce', then invalid parsing will be set as NaN + - If 'ignore', then invalid parsing will return the input + + Returns + ------- + ret : numeric if parsing succeeded. + Return type depends on input. Series if Series, otherwise ndarray + + Examples + -------- + Take separate series and convert to numeric, coercing when told to + + >>> import pandas as pd + >>> s = pd.Series(['1.0', '2', -3]) + >>> pd.to_numeric(s) + >>> s = pd.Series(['apple', '1.0', '2', -3]) + >>> pd.to_numeric(s, errors='ignore') + >>> pd.to_numeric(s, errors='coerce') + """ + + index = name = None + if isinstance(arg, pd.Series): + index, name = arg.index, arg.name + elif isinstance(arg, (list, tuple)): + arg = np.array(arg, dtype='O') + + conv = arg + arg = com._ensure_object(arg) + + coerce_numeric = False if errors in ('ignore', 'raise') else True + + try: + conv = lib.maybe_convert_numeric(arg, + set(), + coerce_numeric=coerce_numeric) + except: + if errors == 'raise': + raise + + if index is not None: + return pd.Series(conv, index=index, name=name) + else: + return conv