Skip to content

ENH: Processing of .mask() for pd.NA #56844

Open
@kuri-menu

Description

@kuri-menu

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

df = pd.DataFrame(
    {
        'A': [0, 1, 2]
    },
)

For example, suppose you have a DataFrame like the one above.

>>> df[
...     pd.Series([-1, 1, pd.NA]) < 0
... ]
	A
0	0

This works as expected.

>>>df[
...     pd.Series([-1, 1, pd.NA]).convert_dtypes() < 0
... ]
	A
0	0

This also works as expected.

>>> df['A'].mask(
...     pd.Series([-1, 1, pd.NA]) < 0,
...     100
... )

0    100
1      1
2      2
Name: A, dtype: int64

This also works as expected.

>>> df['A'].mask(
...     pd.Series([-1, 1, pd.NA]).convert_dtypes() < 0,
...     100
... )

0    100
1      1
2    100  # !?
Name: A, dtype: int64

This behavior confuses a lot of people.

Feature Description

I think most people would expect this result.

>>> df['A'].mask(
...     pd.Series([-1, 1, pd.NA]).convert_dtypes() < 0,
...     100
... )
0    100
1      1
2      2
Name: A, dtype: int64

I think you should either set the result of the logical operation on pd.NA to False

>>> pd.Series([-1, 1, pd.NA]).convert_dtypes() < 0
0     True
1    False
2    False
dtype: boolean

or change the .mask() method to make pd.NA behave the same as False.

Alternative Solutions

Using .fillna(False) will do what you expect, but I think it would be a depressing task for many people.

>>> df['A'].mask(
...     (pd.Series([-1, 1, pd.NA]).convert_dtypes() < 0).fillna(False),
...     100
... )
0    100
1      1
2      2
Name: A, dtype: int64

Additional Context

No response

Metadata

Metadata

Assignees

Labels

BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions