Skip to content

Conversation

hksonngan
Copy link
Contributor

@hksonngan hksonngan commented Feb 24, 2019

@codecov
Copy link

codecov bot commented Feb 24, 2019

Codecov Report

Merging #25424 into master will decrease coverage by <.01%.
The diff coverage is 84.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25424      +/-   ##
==========================================
- Coverage   91.25%   91.24%   -0.01%     
==========================================
  Files         172      172              
  Lines       52973    53007      +34     
==========================================
+ Hits        48338    48366      +28     
- Misses       4635     4641       +6
Flag Coverage Δ
#multiple 89.82% <84.21%> (-0.01%) ⬇️
#single 41.76% <55.26%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.26% <84.21%> (-0.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9eec9b8...7f7a12a. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 24, 2019

Codecov Report

Merging #25424 into master will decrease coverage by 50.04%.
The diff coverage is 45%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25424       +/-   ##
===========================================
- Coverage   91.73%   41.69%   -50.05%     
===========================================
  Files         173      173               
  Lines       52856    52876       +20     
===========================================
- Hits        48490    22048    -26442     
- Misses       4366    30828    +26462
Flag Coverage Δ
#multiple ?
#single 41.69% <45%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 34.23% <45%> (-61.39%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3855a27...b2879de. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are adding a significant amount of non trivial code here - which likely duplicates a fair amount of existing

likely this is change needs some work

@hksonngan
Copy link
Contributor Author

@jreback
I want to check NaN in MultiIndex, but function isna() isn't implemented in MultiIndex.
How can I change?
and I change from np.where() to use pd.core.algorithms.isin() but can not pass the test cases.
Should I use directly np.isin() or modify algorithms.isin()?

@pep8speaks
Copy link

pep8speaks commented Feb 25, 2019

Hello @hksonngan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-14 21:47:12 UTC

@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version MultiIndex labels Feb 25, 2019
@jreback
Copy link
Contributor

jreback commented Mar 3, 2019

@hksonngan you are adding an amazing amount of complexity here. Please step thru the setting code; This is likely a very small change.

@hksonngan
Copy link
Contributor Author

@jreback I think one problem remains when set value multi-index with NaN, get_loc, will crash. I will investigate this.

@hksonngan
Copy link
Contributor Author

@jreback I add code to pass of the case when setting value with NaN index, but I don't know why I get a lot of linting errors when I rebase. I don't change these files. Do you know why?

@hksonngan hksonngan mentioned this pull request Mar 13, 2019
@hksonngan
Copy link
Contributor Author

@jreback ping

@jreback
Copy link
Contributor

jreback commented Mar 14, 2019

@hksonngan this is a massive amount of code. I am not even sure what you are trying to do your fix is VERY complicated. Please point out exactly where the issue is.

@hksonngan
Copy link
Contributor Author

hksonngan commented Mar 15, 2019

@hksonngan this is a massive amount of code. I am not even sure what you are trying to do your fix is VERY complicated. Please point out exactly where the issue is.

@jreback if we have NaN in MultiIndex as:

df = pd.DataFrame(
    [
        ['A', np.nan, 1.23, 4.56],
        ['A', 'G', 1.32, 4.65],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
                                 col_1   col_2
pivot_0  pivot_1
A            NaN           1.23     4.56
              G                1.32     4.65
              D                9.87    10.54

Now I set new df.at[('A', 'F'), 'col_2'] = 0.0 I get wrong result in col_1 as:

                                 col_1   col_2
pivot_0  pivot_1
A            NaN           1.23      4.56
              G                1.32      4.65
              D                9.87    10.54
              F                 1.23      0.0

The right result is must F NaN 0.0
This is because get_indexer indexer = self._base.get_indexer(self, lab_ints) return from hash with duplicate value for index NaN and new as F here.
For above example indexer = [-1 4 6 -1]

@hksonngan
Copy link
Contributor Author

Or I add new MultiIndex with NaN, I get exception like this code:

df = pd.DataFrame(
    [
        ['A', 'G', 1.32, 4.65],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
df.at[('A', np.nan), 'col_2'] = 0.0 # Get exception in here

@hksonngan
Copy link
Contributor Author

hksonngan commented Mar 15, 2019

  • So I fixed simple by check 'NaN' exist or not in MultiIndex with function _hasnans, because when add new value to DataFrame, the get_indexer return from hash with dulicate value -1, as NaN value and new value is same result from hashtable.
  • Adding new value when have NaN in index will get the exception at line 684 index.pyx lab_int was not calculated for NaN value at class MultiIndex with function _engine() line 1234 . So I compensate with position of NaN in levels from line 688 index.pyx, I raise exception to call function _hasnans, and recalculate position of NaN in function _engine() line 1234
  • When MultiIndex initialize, also call function _hasnans, because I want to cache this value. If the new index NaN, I can raise exception in get_loc() as above.

@hksonngan
Copy link
Contributor Author

close as I will investigate another way to solve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiIndex Bug Copying Values Incorrectly When Adding Values To Index
4 participants