Skip to content

BUG: MultiIndex.unique incorrect when NA value is present #41823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jun 5, 2021 · 3 comments
Closed

BUG: MultiIndex.unique incorrect when NA value is present #41823

jbrockmendel opened this issue Jun 5, 2021 · 3 comments

Comments

@jbrockmendel
Copy link
Member

drop_duplicates looks right, while unique (and _get_unique_index) keep a duplicate entry containing an NA value. Example based on test_intersection_with_missing_values_on_both_sides. The analogous problem crops up for every entry in nulls_fixture

nulls_fixture = np.nan
mi1 = pd.MultiIndex.from_arrays([[3, nulls_fixture, 4, nulls_fixture], [1, 2, 4, 2]])

>>> mi1.unique()
MultiIndex([(3.0, 1),
            (nan, 2),
            (4.0, 4),
            (nan, 2)],
           )

>>> mi1.drop_duplicates()
MultiIndex([(3.0, 1),
            (nan, 2),
            (4.0, 4)],
           )
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2021
@jbrockmendel
Copy link
Member Author

@realead i think one of your recent PRs closes this, not sure which one

@realead
Copy link
Contributor

realead commented Jun 18, 2021

It was closed with #41952

@realead
Copy link
Contributor

realead commented Jun 18, 2021

The difference is , that mi1.unique operates on self.values (

def unique(self):
)

(i.e. [(3.0, 1) (nan, 2) (4.0, 4) (nan, 2)]) , while drop_duplicates operates on self.codes

ids = get_group_index(self.codes, shape, sort=False, xnull=False)

where nans are replaced by -1.

This explains why drop_duplicates worked, but not unique.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants