Fixed regression of Multi index with NaN #25424

hksonngan · 2019-02-24T02:16:12Z

closes MultiIndex Bug Copying Values Incorrectly When Adding Values To Index #22247
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
As pull REF: codes-based MultiIndex engine #19074 changed compute of hashtable from self.values to self.codes and self.levels, NaN values doesn't cover.
I just simply fixed this regression the special case, by check return value from the hashtable.

codecov · 2019-02-24T02:48:58Z

Codecov Report

Merging #25424 into master will decrease coverage by <.01%.
The diff coverage is 84.21%.

@@            Coverage Diff             @@
##           master   #25424      +/-   ##
==========================================
- Coverage   91.25%   91.24%   -0.01%     
==========================================
  Files         172      172              
  Lines       52973    53007      +34     
==========================================
+ Hits        48338    48366      +28     
- Misses       4635     4641       +6

Flag	Coverage Δ
#multiple	`89.82% <84.21%> (-0.01%)`	⬇️
#single	`41.76% <55.26%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`95.26% <84.21%> (-0.36%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9eec9b8...7f7a12a. Read the comment docs.

codecov · 2019-02-24T02:48:58Z

Codecov Report

Merging #25424 into master will decrease coverage by 50.04%.
The diff coverage is 45%.

@@             Coverage Diff             @@
##           master   #25424       +/-   ##
===========================================
- Coverage   91.73%   41.69%   -50.05%     
===========================================
  Files         173      173               
  Lines       52856    52876       +20     
===========================================
- Hits        48490    22048    -26442     
- Misses       4366    30828    +26462

Flag	Coverage Δ
#multiple	`?`
#single	`41.69% <45%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`34.23% <45%> (-61.39%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.15%)`	⬇️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3855a27...b2879de. Read the comment docs.

jreback

you are adding a significant amount of non trivial code here - which likely duplicates a fair amount of existing

likely this is change needs some work

hksonngan · 2019-02-25T04:22:41Z

@jreback
I want to check NaN in MultiIndex, but function isna() isn't implemented in MultiIndex.
How can I change?
and I change from np.where() to use pd.core.algorithms.isin() but can not pass the test cases.
Should I use directly np.isin() or modify algorithms.isin()?

pep8speaks · 2019-02-25T06:43:58Z

Hello @hksonngan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-14 21:47:12 UTC

jreback · 2019-03-03T01:51:41Z

@hksonngan you are adding an amazing amount of complexity here. Please step thru the setting code; This is likely a very small change.

hksonngan · 2019-03-05T23:57:30Z

@jreback I think one problem remains when set value multi-index with NaN, get_loc, will crash. I will investigate this.

hksonngan · 2019-03-12T23:47:04Z

@jreback I add code to pass of the case when setting value with NaN index, but I don't know why I get a lot of linting errors when I rebase. I don't change these files. Do you know why?

hksonngan · 2019-03-14T01:48:14Z

@jreback ping

jreback · 2019-03-14T22:26:24Z

@hksonngan this is a massive amount of code. I am not even sure what you are trying to do your fix is VERY complicated. Please point out exactly where the issue is.

hksonngan · 2019-03-15T00:53:14Z

@hksonngan this is a massive amount of code. I am not even sure what you are trying to do your fix is VERY complicated. Please point out exactly where the issue is.

@jreback if we have NaN in MultiIndex as:

df = pd.DataFrame(
    [
        ['A', np.nan, 1.23, 4.56],
        ['A', 'G', 1.32, 4.65],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)

                                 col_1   col_2
pivot_0  pivot_1
A            NaN           1.23     4.56
              G                1.32     4.65
              D                9.87    10.54

Now I set new df.at[('A', 'F'), 'col_2'] = 0.0 I get wrong result in col_1 as:

                                 col_1   col_2
pivot_0  pivot_1
A            NaN           1.23      4.56
              G                1.32      4.65
              D                9.87    10.54
              F                 1.23      0.0

The right result is must F NaN 0.0
This is because get_indexer indexer = self._base.get_indexer(self, lab_ints) return from hash with duplicate value for index NaN and new as F here.
For above example indexer = [-1 4 6 -1]

hksonngan · 2019-03-15T00:55:11Z

Or I add new MultiIndex with NaN, I get exception like this code:

df = pd.DataFrame(
    [
        ['A', 'G', 1.32, 4.65],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
df.at[('A', np.nan), 'col_2'] = 0.0 # Get exception in here

hksonngan · 2019-03-15T01:12:09Z

So I fixed simple by check 'NaN' exist or not in MultiIndex with function _hasnans, because when add new value to DataFrame, the get_indexer return from hash with dulicate value -1, as NaN value and new value is same result from hashtable.
Adding new value when have NaN in index will get the exception at line 684 index.pyx lab_int was not calculated for NaN value at class MultiIndex with function _engine() line 1234 . So I compensate with position of NaN in levels from line 688 index.pyx, I raise exception to call function _hasnans, and recalculate position of NaN in function _engine() line 1234
When MultiIndex initialize, also call function _hasnans, because I want to cache this value. If the new index NaN, I can raise exception in get_loc() as above.

hksonngan · 2019-04-16T20:10:13Z

close as I will investigate another way to solve this

jreback requested changes Feb 24, 2019

View reviewed changes

gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version MultiIndex labels Feb 25, 2019

hksonngan mentioned this pull request Mar 13, 2019

CI: Linting error #25715

Closed

add test case set value with NaN

7f7a12a

hksonngan closed this Apr 16, 2019

h-vetinari mentioned this pull request Sep 21, 2019

TST: restore type checks to maybe_promote tests #28561

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixed regression of Multi index with NaN #25424

Fixed regression of Multi index with NaN #25424

Uh oh!

hksonngan commented Feb 24, 2019 •

edited

Loading

Uh oh!

codecov bot commented Feb 24, 2019 •

edited

Loading

Uh oh!

codecov bot commented Feb 24, 2019

Uh oh!

jreback left a comment

Uh oh!

hksonngan commented Feb 25, 2019

Uh oh!

pep8speaks commented Feb 25, 2019 •

edited

Loading

Uh oh!

jreback commented Mar 3, 2019

Uh oh!

hksonngan commented Mar 5, 2019

Uh oh!

hksonngan commented Mar 12, 2019

Uh oh!

hksonngan commented Mar 14, 2019

Uh oh!

jreback commented Mar 14, 2019 •

edited

Loading

Uh oh!

hksonngan commented Mar 15, 2019 •

edited

Loading

Uh oh!

hksonngan commented Mar 15, 2019

Uh oh!

hksonngan commented Mar 15, 2019 •

edited

Loading

Uh oh!

hksonngan commented Apr 16, 2019

Uh oh!

Uh oh!

Uh oh!

Fixed regression of Multi index with NaN #25424

Fixed regression of Multi index with NaN #25424

Uh oh!

Conversation

hksonngan commented Feb 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented Feb 24, 2019

Codecov Report

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

hksonngan commented Feb 25, 2019

Uh oh!

pep8speaks commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-03-14 21:47:12 UTC

Uh oh!

jreback commented Mar 3, 2019

Uh oh!

hksonngan commented Mar 5, 2019

Uh oh!

hksonngan commented Mar 12, 2019

Uh oh!

hksonngan commented Mar 14, 2019

Uh oh!

jreback commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hksonngan commented Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hksonngan commented Mar 15, 2019

Uh oh!

hksonngan commented Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hksonngan commented Apr 16, 2019

Uh oh!

Uh oh!

hksonngan commented Feb 24, 2019 •

edited

Loading

codecov bot commented Feb 24, 2019 •

edited

Loading

pep8speaks commented Feb 25, 2019 •

edited

Loading

jreback commented Mar 14, 2019 •

edited

Loading

hksonngan commented Mar 15, 2019 •

edited

Loading

hksonngan commented Mar 15, 2019 •

edited

Loading