Skip to content

call __finalize__ in more methods #37186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,7 @@ Other
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
- Fixed bug in metadata propagation incorrectly copying DataFrame columns as metadata when the column name overlaps with the metadata name (:issue:`37037`)
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors (:issue:`28283`)
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors and :class:`DataFrame.duplicated` and ::class:`DataFrame.stack` methods (:issue:`28283`)
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)

Expand Down
9 changes: 6 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5286,7 +5286,8 @@ def f(vals):
labels, shape = map(list, zip(*map(f, vals)))

ids = get_group_index(labels, shape, sort=False, xnull=False)
return self._constructor_sliced(duplicated_int64(ids, keep), index=self.index)
result = self._constructor_sliced(duplicated_int64(ids, keep), index=self.index)
return result.__finalize__(self, method="duplicated")

# ----------------------------------------------------------------------
# Sorting
Expand Down Expand Up @@ -7096,9 +7097,11 @@ def stack(self, level=-1, dropna=True):
from pandas.core.reshape.reshape import stack, stack_multiple

if isinstance(level, (tuple, list)):
return stack_multiple(self, level, dropna=dropna)
result = stack_multiple(self, level, dropna=dropna)
else:
return stack(self, level, dropna=dropna)
result = stack(self, level, dropna=dropna)

return result.__finalize__(self, method="stack")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this falls into the name issue I spotted in #37037. Essentially, if df has a column named name we try to set result.name = self.name, i.e. the series name.

In [10]: df = pd.DataFrame({"A": [1, 2], "name": [3, 4]})

In [11]: df.stack().__finalize__(df)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-899f15844169> in <module>
----> 1 df.stack().__finalize__(df)

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/generic.py in __finalize__(self, other, method, **kwargs)
   5157         if isinstance(other, NDFrame):
   5158             for name in self._metadata:
-> 5159                 object.__setattr__(self, name, getattr(other, name, None))
   5160         return self
   5161

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/series.py in name(self, value)
    464     def name(self, value):
    465         if value is not None and not is_hashable(value):
--> 466             raise TypeError("Series.name must be a hashable type")
    467         object.__setattr__(self, "_name", value)
    468

TypeError: Series.name must be a hashable type

I'll think the solution is to only iterate over the intersection of self._metadata and other._metadata. I'll put up a PR.


def explode(
self, column: Union[str, Tuple], ignore_index: bool = False
Expand Down
10 changes: 2 additions & 8 deletions pandas/tests/generic/test_finalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,7 @@
(pd.DataFrame, frame_data, operator.methodcaller("notnull")),
(pd.DataFrame, frame_data, operator.methodcaller("dropna")),
(pd.DataFrame, frame_data, operator.methodcaller("drop_duplicates")),
pytest.param(
(pd.DataFrame, frame_data, operator.methodcaller("duplicated")),
marks=not_implemented_mark,
),
(pd.DataFrame, frame_data, operator.methodcaller("duplicated")),
(pd.DataFrame, frame_data, operator.methodcaller("sort_values", by="A")),
(pd.DataFrame, frame_data, operator.methodcaller("sort_index")),
(pd.DataFrame, frame_data, operator.methodcaller("nlargest", 1, "A")),
Expand Down Expand Up @@ -169,10 +166,7 @@
),
marks=not_implemented_mark,
),
pytest.param(
(pd.DataFrame, frame_data, operator.methodcaller("stack")),
marks=not_implemented_mark,
),
(pd.DataFrame, frame_data, operator.methodcaller("stack")),
pytest.param(
(pd.DataFrame, frame_data, operator.methodcaller("explode", "A")),
marks=not_implemented_mark,
Expand Down