Skip to content

Commit 9f7adc6

Browse files
committed
Merge remote-tracking branch 'upstream/master' into pr/ShreyDixit-mask_pos_args_deprecation
2 parents 4a3e82b + fd38824 commit 9f7adc6

File tree

29 files changed

+372
-67
lines changed

29 files changed

+372
-67
lines changed

doc/source/user_guide/io.rst

-9
Original file line numberDiff line numberDiff line change
@@ -3684,15 +3684,6 @@ one can pass an :class:`~pandas.io.excel.ExcelWriter`.
36843684
df1.to_excel(writer, sheet_name="Sheet1")
36853685
df2.to_excel(writer, sheet_name="Sheet2")
36863686
3687-
.. note::
3688-
3689-
Wringing a little more performance out of ``read_excel``
3690-
Internally, Excel stores all numeric data as floats. Because this can
3691-
produce unexpected behavior when reading in data, pandas defaults to trying
3692-
to convert integers to floats if it doesn't lose information (``1.0 -->
3693-
1``). You can pass ``convert_float=False`` to disable this behavior, which
3694-
may give a slight performance improvement.
3695-
36963687
.. _io.excel_writing_buffer:
36973688

36983689
Writing Excel files to memory

doc/source/whatsnew/v1.3.0.rst

+8
Original file line numberDiff line numberDiff line change
@@ -676,12 +676,16 @@ Deprecations
676676
- The ``inplace`` parameter of :meth:`Categorical.remove_categories`, :meth:`Categorical.add_categories`, :meth:`Categorical.reorder_categories`, :meth:`Categorical.rename_categories`, :meth:`Categorical.set_categories` is deprecated and will be removed in a future version (:issue:`37643`)
677677
- Deprecated :func:`merge` producing duplicated columns through the ``suffixes`` keyword and already existing columns (:issue:`22818`)
678678
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
679+
- Deprecated the ``convert_float`` optional argument in :func:`read_excel` and :meth:`ExcelFile.parse` (:issue:`41127`)
679680
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
680681
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
681682
- Deprecated passing arguments (apart from ``cond`` and ``other``) as positional in :meth:`DataFrame.mask` (:issue:`41485`)
682683
- Deprecated passing arguments as positional in :meth:`DataFrame.clip` and :meth:`Series.clip` (other than ``"upper"`` and ``"lower"``) (:issue:`41485`)
683684
- Deprecated special treatment of lists with first element a Categorical in the :class:`DataFrame` constructor; pass as ``pd.DataFrame({col: categorical, ...})`` instead (:issue:`38845`)
684685
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`)
686+
- Deprecated passing arguments as positional in :meth:`DataFrame.dropna` and :meth:`Series.dropna` (:issue:`41485`)
687+
- Deprecated passing arguments as positional in :meth:`DataFrame.set_index` (other than ``"keys"``) (:issue:`41485`)
688+
- Deprecated passing arguments as positional (except for ``"levels"``) in :meth:`MultiIndex.set_levels` (:issue:`41485`)
685689
- Deprecated passing arguments as positional in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` (:issue:`41485`)
686690
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` (except for ``subset``), :meth:`Series.drop_duplicates`, :meth:`Index.drop_duplicates` and :meth:`MultiIndex.drop_duplicates`(:issue:`41485`)
687691
- Deprecated passing arguments (apart from ``value``) as positional in :meth:`DataFrame.fillna` and :meth:`Series.fillna` (:issue:`41485`)
@@ -974,6 +978,7 @@ I/O
974978
- Bug in :func:`read_orc` always raising ``AttributeError`` (:issue:`40918`)
975979
- Bug in :func:`read_csv` and :func:`read_table` silently ignoring ``prefix`` if ``names`` and ``prefix`` are defined, now raising ``ValueError`` (:issue:`39123`)
976980
- Bug in :func:`read_csv` and :func:`read_excel` not respecting dtype for duplicated column name when ``mangle_dupe_cols`` is set to ``True`` (:issue:`35211`)
981+
- Bug in :func:`read_csv` silently ignoring ``sep`` if ``delimiter`` and ``sep`` are defined, now raising ``ValueError`` (:issue:`39823`)
977982
- Bug in :func:`read_csv` and :func:`read_table` misinterpreting arguments when ``sys.setprofile`` had been previously called (:issue:`41069`)
978983
- Bug in the conversion from pyarrow to pandas (e.g. for reading Parquet) with nullable dtypes and a pyarrow array whose data buffer size is not a multiple of dtype size (:issue:`40896`)
979984
- Bug in :func:`read_excel` would raise an error when pandas could not determine the file type, even when user specified the ``engine`` argument (:issue:`41225`)
@@ -1037,6 +1042,8 @@ Groupby/resample/rolling
10371042
- Bug in :meth:`DataFrameGroupBy.__getitem__` with non-unique columns incorrectly returning a malformed :class:`SeriesGroupBy` instead of :class:`DataFrameGroupBy` (:issue:`41427`)
10381043
- Bug in :meth:`DataFrameGroupBy.transform` with non-unique columns incorrectly raising ``AttributeError`` (:issue:`41427`)
10391044
- Bug in :meth:`Resampler.apply` with non-unique columns incorrectly dropping duplicated columns (:issue:`41445`)
1045+
- Bug in :meth:`SeriesGroupBy` aggregations incorrectly returning empty :class:`Series` instead of raising ``TypeError`` on aggregations that are invalid for its dtype, e.g. ``.prod`` with ``datetime64[ns]`` dtype (:issue:`41342`)
1046+
- Bug in :meth:`DataFrame.rolling.__iter__` where ``on`` was not assigned to the index of the resulting objects (:issue:`40373`)
10401047
- Bug in :meth:`DataFrameGroupBy.transform` and :meth:`DataFrameGroupBy.agg` with ``engine="numba"`` where ``*args`` were being cached with the user passed function (:issue:`41647`)
10411048

10421049
Reshaping
@@ -1101,6 +1108,7 @@ Other
11011108
- Bug in :func:`pandas.testing.assert_index_equal` with ``exact=True`` not raising when comparing :class:`CategoricalIndex` instances with ``Int64Index`` and ``RangeIndex`` categories (:issue:`41263`)
11021109
- Bug in :meth:`DataFrame.equals`, :meth:`Series.equals`, :meth:`Index.equals` with object-dtype containing ``np.datetime64("NaT")`` or ``np.timedelta64("NaT")`` (:issue:`39650`)
11031110
- Bug in :func:`pandas.util.show_versions` where console JSON output was not proper JSON (:issue:`39701`)
1111+
- Let Pandas compile on z/OS when using `xlc <https://www.ibm.com/products/xl-cpp-compiler-zos>`_ (:issue:`35826`)
11041112
- Bug in :meth:`DataFrame.convert_dtypes` incorrectly raised ValueError when called on an empty DataFrame (:issue:`40393`)
11051113
- Bug in :meth:`DataFrame.agg()` not sorting the aggregated axis in the order of the provided aggragation functions when one or more aggregation function fails to produce results (:issue:`33634`)
11061114
- Bug in :meth:`DataFrame.clip` not interpreting missing values as no threshold (:issue:`40420`)

pandas/_libs/src/headers/cmath

+12
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,18 @@ namespace std {
2525
__inline int isnan(double x) { return _isnan(x); }
2626
__inline int notnan(double x) { return x == x; }
2727
}
28+
#elif defined(__MVS__)
29+
#include <cmath>
30+
31+
#define _signbit signbit
32+
#undef signbit
33+
#undef isnan
34+
35+
namespace std {
36+
__inline int notnan(double x) { return x == x; }
37+
__inline int signbit(double num) { return _signbit(num); }
38+
__inline int isnan(double x) { return isnan(x); }
39+
}
2840
#else
2941
#include <cmath>
3042

pandas/core/frame.py

+2
Original file line numberDiff line numberDiff line change
@@ -5330,6 +5330,7 @@ def shift(
53305330
periods=periods, freq=freq, axis=axis, fill_value=fill_value
53315331
)
53325332

5333+
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "keys"])
53335334
def set_index(
53345335
self,
53355336
keys,
@@ -5834,6 +5835,7 @@ def notna(self) -> DataFrame:
58345835
def notnull(self) -> DataFrame:
58355836
return ~self.isna()
58365837

5838+
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
58375839
def dropna(
58385840
self,
58395841
axis: Axis = 0,

pandas/core/groupby/generic.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -323,15 +323,18 @@ def _aggregate_multiple_funcs(self, arg) -> DataFrame:
323323
return output
324324

325325
def _cython_agg_general(
326-
self, how: str, alt=None, numeric_only: bool = True, min_count: int = -1
326+
self, how: str, alt: Callable, numeric_only: bool, min_count: int = -1
327327
):
328328

329329
obj = self._selected_obj
330330
objvals = obj._values
331331
data = obj._mgr
332332

333333
if numeric_only and not is_numeric_dtype(obj.dtype):
334-
raise DataError("No numeric types to aggregate")
334+
# GH#41291 match Series behavior
335+
raise NotImplementedError(
336+
f"{type(self).__name__}.{how} does not implement numeric_only."
337+
)
335338

336339
# This is overkill because it is only called once, but is here to
337340
# mirror the array_func used in DataFrameGroupBy._cython_agg_general
@@ -1056,7 +1059,7 @@ def _iterate_slices(self) -> Iterable[Series]:
10561059
yield values
10571060

10581061
def _cython_agg_general(
1059-
self, how: str, alt=None, numeric_only: bool = True, min_count: int = -1
1062+
self, how: str, alt: Callable, numeric_only: bool, min_count: int = -1
10601063
) -> DataFrame:
10611064
# Note: we never get here with how="ohlc"; that goes through SeriesGroupBy
10621065

pandas/core/groupby/groupby.py

+50-7
Original file line numberDiff line numberDiff line change
@@ -1101,6 +1101,34 @@ def _wrap_transformed_output(self, output: Mapping[base.OutputKey, ArrayLike]):
11011101
def _wrap_applied_output(self, data, keys, values, not_indexed_same: bool = False):
11021102
raise AbstractMethodError(self)
11031103

1104+
def _resolve_numeric_only(self, numeric_only: bool | lib.NoDefault) -> bool:
1105+
"""
1106+
Determine subclass-specific default value for 'numeric_only'.
1107+
1108+
For SeriesGroupBy we want the default to be False (to match Series behavior).
1109+
For DataFrameGroupBy we want it to be True (for backwards-compat).
1110+
1111+
Parameters
1112+
----------
1113+
numeric_only : bool or lib.no_default
1114+
1115+
Returns
1116+
-------
1117+
bool
1118+
"""
1119+
# GH#41291
1120+
if numeric_only is lib.no_default:
1121+
# i.e. not explicitly passed by user
1122+
if self.obj.ndim == 2:
1123+
# i.e. DataFrameGroupBy
1124+
numeric_only = True
1125+
else:
1126+
numeric_only = False
1127+
1128+
# error: Incompatible return value type (got "Union[bool, NoDefault]",
1129+
# expected "bool")
1130+
return numeric_only # type: ignore[return-value]
1131+
11041132
# -----------------------------------------------------------------
11051133
# numba
11061134

@@ -1308,6 +1336,7 @@ def _agg_general(
13081336
alias: str,
13091337
npfunc: Callable,
13101338
):
1339+
13111340
with group_selection_context(self):
13121341
# try a cython aggregation if we can
13131342
result = None
@@ -1367,7 +1396,7 @@ def _agg_py_fallback(
13671396
return ensure_block_shape(res_values, ndim=ndim)
13681397

13691398
def _cython_agg_general(
1370-
self, how: str, alt=None, numeric_only: bool = True, min_count: int = -1
1399+
self, how: str, alt: Callable, numeric_only: bool, min_count: int = -1
13711400
):
13721401
raise AbstractMethodError(self)
13731402

@@ -1587,7 +1616,7 @@ def count(self):
15871616
@final
15881617
@Substitution(name="groupby")
15891618
@Substitution(see_also=_common_see_also)
1590-
def mean(self, numeric_only: bool = True):
1619+
def mean(self, numeric_only: bool | lib.NoDefault = lib.no_default):
15911620
"""
15921621
Compute mean of groups, excluding missing values.
15931622
@@ -1635,6 +1664,8 @@ def mean(self, numeric_only: bool = True):
16351664
2 4.0
16361665
Name: B, dtype: float64
16371666
"""
1667+
numeric_only = self._resolve_numeric_only(numeric_only)
1668+
16381669
result = self._cython_agg_general(
16391670
"mean",
16401671
alt=lambda x: Series(x).mean(numeric_only=numeric_only),
@@ -1645,7 +1676,7 @@ def mean(self, numeric_only: bool = True):
16451676
@final
16461677
@Substitution(name="groupby")
16471678
@Appender(_common_see_also)
1648-
def median(self, numeric_only=True):
1679+
def median(self, numeric_only: bool | lib.NoDefault = lib.no_default):
16491680
"""
16501681
Compute median of groups, excluding missing values.
16511682
@@ -1662,6 +1693,8 @@ def median(self, numeric_only=True):
16621693
Series or DataFrame
16631694
Median of values within each group.
16641695
"""
1696+
numeric_only = self._resolve_numeric_only(numeric_only)
1697+
16651698
result = self._cython_agg_general(
16661699
"median",
16671700
alt=lambda x: Series(x).median(numeric_only=numeric_only),
@@ -1719,8 +1752,9 @@ def var(self, ddof: int = 1):
17191752
Variance of values within each group.
17201753
"""
17211754
if ddof == 1:
1755+
numeric_only = self._resolve_numeric_only(lib.no_default)
17221756
return self._cython_agg_general(
1723-
"var", alt=lambda x: Series(x).var(ddof=ddof)
1757+
"var", alt=lambda x: Series(x).var(ddof=ddof), numeric_only=numeric_only
17241758
)
17251759
else:
17261760
func = lambda x: x.var(ddof=ddof)
@@ -1785,7 +1819,10 @@ def size(self) -> FrameOrSeriesUnion:
17851819

17861820
@final
17871821
@doc(_groupby_agg_method_template, fname="sum", no=True, mc=0)
1788-
def sum(self, numeric_only: bool = True, min_count: int = 0):
1822+
def sum(
1823+
self, numeric_only: bool | lib.NoDefault = lib.no_default, min_count: int = 0
1824+
):
1825+
numeric_only = self._resolve_numeric_only(numeric_only)
17891826

17901827
# If we are grouping on categoricals we want unobserved categories to
17911828
# return zero, rather than the default of NaN which the reindexing in
@@ -1802,7 +1839,11 @@ def sum(self, numeric_only: bool = True, min_count: int = 0):
18021839

18031840
@final
18041841
@doc(_groupby_agg_method_template, fname="prod", no=True, mc=0)
1805-
def prod(self, numeric_only: bool = True, min_count: int = 0):
1842+
def prod(
1843+
self, numeric_only: bool | lib.NoDefault = lib.no_default, min_count: int = 0
1844+
):
1845+
numeric_only = self._resolve_numeric_only(numeric_only)
1846+
18061847
return self._agg_general(
18071848
numeric_only=numeric_only, min_count=min_count, alias="prod", npfunc=np.prod
18081849
)
@@ -2731,7 +2772,7 @@ def _get_cythonized_result(
27312772
how: str,
27322773
cython_dtype: np.dtype,
27332774
aggregate: bool = False,
2734-
numeric_only: bool = True,
2775+
numeric_only: bool | lib.NoDefault = lib.no_default,
27352776
needs_counts: bool = False,
27362777
needs_values: bool = False,
27372778
needs_2d: bool = False,
@@ -2799,6 +2840,8 @@ def _get_cythonized_result(
27992840
-------
28002841
`Series` or `DataFrame` with filled values
28012842
"""
2843+
numeric_only = self._resolve_numeric_only(numeric_only)
2844+
28022845
if result_is_index and aggregate:
28032846
raise ValueError("'result_is_index' and 'aggregate' cannot both be True!")
28042847
if post_processing and not callable(post_processing):

pandas/core/indexes/multi.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -804,6 +804,7 @@ def _set_levels(
804804

805805
self._reset_cache()
806806

807+
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "levels"])
807808
def set_levels(
808809
self, levels, level=None, inplace=None, verify_integrity: bool = True
809810
):
@@ -895,7 +896,7 @@ def set_levels(
895896
warnings.warn(
896897
"inplace is deprecated and will be removed in a future version.",
897898
FutureWarning,
898-
stacklevel=2,
899+
stacklevel=3,
899900
)
900901
else:
901902
inplace = False

pandas/core/series.py

+1
Original file line numberDiff line numberDiff line change
@@ -5093,6 +5093,7 @@ def notna(self) -> Series:
50935093
def notnull(self) -> Series:
50945094
return super().notnull()
50955095

5096+
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
50965097
def dropna(self, axis=0, inplace=False, how=None):
50975098
"""
50985099
Return a new Series with missing values removed.

pandas/core/window/rolling.py

+1
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,7 @@ def __repr__(self) -> str:
291291

292292
def __iter__(self):
293293
obj = self._create_data(self._selected_obj)
294+
obj = obj.set_axis(self._on)
294295
indexer = self._get_window_indexer()
295296

296297
start, end = indexer.get_window_bounds(

pandas/io/excel/_base.py

+19-14
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
import abc
44
import datetime
5-
import inspect
65
from io import BytesIO
76
import os
87
from textwrap import fill
@@ -33,6 +32,7 @@
3332
deprecate_nonkeyword_arguments,
3433
doc,
3534
)
35+
from pandas.util._exceptions import find_stack_level
3636

3737
from pandas.core.dtypes.common import (
3838
is_bool,
@@ -245,6 +245,10 @@
245245
Convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric
246246
data will be read in as floats: Excel stores all numbers as floats
247247
internally.
248+
249+
.. deprecated:: 1.3.0
250+
convert_float will be removed in a future version
251+
248252
mangle_dupe_cols : bool, default True
249253
Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
250254
'X'...'X'. Passing in False will cause data to be overwritten if there
@@ -355,7 +359,7 @@ def read_excel(
355359
thousands=None,
356360
comment=None,
357361
skipfooter=0,
358-
convert_float=True,
362+
convert_float=None,
359363
mangle_dupe_cols=True,
360364
storage_options: StorageOptions = None,
361365
):
@@ -489,11 +493,21 @@ def parse(
489493
thousands=None,
490494
comment=None,
491495
skipfooter=0,
492-
convert_float=True,
496+
convert_float=None,
493497
mangle_dupe_cols=True,
494498
**kwds,
495499
):
496500

501+
if convert_float is None:
502+
convert_float = True
503+
else:
504+
stacklevel = find_stack_level()
505+
warnings.warn(
506+
"convert_float is deprecated and will be removed in a future version",
507+
FutureWarning,
508+
stacklevel=stacklevel,
509+
)
510+
497511
validate_header_arg(header)
498512

499513
ret_dict = False
@@ -1206,16 +1220,7 @@ def __init__(
12061220
f"only the xls format is supported. Install openpyxl instead."
12071221
)
12081222
elif ext and ext != "xls":
1209-
caller = inspect.stack()[1]
1210-
if (
1211-
caller.filename.endswith(
1212-
os.path.join("pandas", "io", "excel", "_base.py")
1213-
)
1214-
and caller.function == "read_excel"
1215-
):
1216-
stacklevel = 4
1217-
else:
1218-
stacklevel = 2
1223+
stacklevel = find_stack_level()
12191224
warnings.warn(
12201225
f"Your version of xlrd is {xlrd_version}. In xlrd >= 2.0, "
12211226
f"only the xls format is supported. Install "
@@ -1251,7 +1256,7 @@ def parse(
12511256
thousands=None,
12521257
comment=None,
12531258
skipfooter=0,
1254-
convert_float=True,
1259+
convert_float=None,
12551260
mangle_dupe_cols=True,
12561261
**kwds,
12571262
):

pandas/io/parsers/readers.py

+3
Original file line numberDiff line numberDiff line change
@@ -1255,6 +1255,9 @@ def _refine_defaults_read(
12551255
sep is lib.no_default or sep == delim_default
12561256
)
12571257

1258+
if delimiter and (sep is not lib.no_default):
1259+
raise ValueError("Specified a sep and a delimiter; you can only specify one.")
1260+
12581261
if names is not lib.no_default and prefix is not lib.no_default:
12591262
raise ValueError("Specified named and prefix; you can only specify one.")
12601263

0 commit comments

Comments
 (0)