Inconsistent behavior of hierarchical indexes when indexes are of different data types #3521

kghose · 2013-05-03T15:07:50Z

import pandas as pd, pylab

#int index
col = pd.MultiIndex.from_tuples([('c1',0),('c1',1),('c2',0)],names=['f','s'])
dat = pylab.randn(2,len(col))
df1 = pd.DataFrame(dat, columns=col)

#Text index
col = pd.MultiIndex.from_tuples([('c3','x'),('c3','y')])
dat = pylab.randn(2,len(col))
df2 = pd.DataFrame(dat, columns=col)

#This does not merge column indexes
df_a = pd.concat([df1,df2])
#But this does
df_b = pd.concat([df1,df2], axis=1)

col = pd.MultiIndex.from_tuples([('c1','0'),('c1','1'),('c2','0')],names=['f','s'])
dat = pylab.randn(2,len(col))
df4 = pd.DataFrame(dat, columns=col)
df_d = pd.concat([df4,df2], axis=0)

In [31]: df_a
Out[31]: 
   c1      c2        c3          
    0   1   0         x         y
0 NaN NaN NaN       NaN       NaN
1 NaN NaN NaN       NaN       NaN
0 NaN NaN NaN -0.694275 -1.357936
1 NaN NaN NaN -1.450523 -1.453957

This is unexpected behavior, especially since axis=1 does not care

In [32]: df_b
Out[32]: 
f        c1                  c2        c3          
s         0         1         0         x         y
0  0.381601 -0.730360  0.157936 -0.694275 -1.357936
1  0.344333 -1.308118  1.503335 -1.450523 -1.453957

This now works, because the sub-index is same type (str)

In [33]: df_d
Out[33]: 
         c1                  c2        c3          
          0         1         0         x         y
0  0.162019 -0.325463 -0.200149       NaN       NaN
1 -0.142477 -0.089191 -0.439161       NaN       NaN
0       NaN       NaN       NaN -0.694275 -1.357936
1       NaN       NaN       NaN -1.450523 -1.453957

The text was updated successfully, but these errors were encountered:

cpcloud · 2013-05-03T15:14:13Z

You should try to use GitHub flavored markdown when you post code/output or images (slightly faster than copypasting/retyping/editing-after-retyping), it's much easier to read.

cpcloud · 2013-05-03T15:18:01Z

@kghose What version are you using? I get a ValueError when I try to concat df1 and df2 and when I do the same with axis=1 I get the thing I think you expected:

kghose · 2013-05-03T15:27:22Z

Thanks for checking. I'm using '0.11.0'. Had the same issue with '0.10.0'

cpcloud · 2013-05-03T15:32:21Z

Okay. That's probably the issue. I'm using git master (which is version 0.12.0.dev-9b6b8fb), so if you can you might want to upgrade. At least you might be able to clone the latest repo and do a git diff on pandas/tools/merge.py and then grep for exceptions to see what changeset introduced this. You still have to upgrade though if you want the expected behavior.

kghose · 2013-05-03T15:35:44Z

@cpcloud . Thank you! That helps me out. I might stick with the released versions for a bit. How is your experience with master? How stable is it? I worry about bugs on the bleeding edge.

kghose · 2013-05-03T15:36:15Z

Closing as issue seems to be resolved with latest version ( @cpcloud )

cpcloud · 2013-05-03T15:39:37Z

@kghose I'm not sure how to measure the stability of a code base. I find it pretty "stable", but I'm probably the worst person to ask about this, since I work in neuroscience and stability is usually not a concern because I can fix it most of the time. I also use the bleeding edge of every part of the Python science stack and it has yet to fail me in a way that totally curbs my productivity. You should take what I say about this with a tiny grain of salt, though, for the aforementioned reasons.

kghose · 2013-05-03T15:48:39Z

Oh, I'm in the same business. But I worry about subtle bugs that get through and skew results, but I simply should write more thorough tests that I run periodically (My usual practice is to test a method and then freeze it and forget about it. But newer versions of libraries may introduce subtle bugs e.g. in NaN handling that are nasty for data processing).

cpcloud · 2013-05-03T15:54:17Z

I know what you mean. For analyzing data I am crazy about assertions I put them everywhere so that I can fail as soon as possible. That helps cut down on writing tests for, say, a one off plotting script specific to a particular paper or kind of analysis you're doing. I've found that pandas and the scientific python community is very aware of this exact issue and these 'subtle' bugs are few and far between. However, see #3513 for an example of what you're talking about.

cpcloud · 2013-05-03T15:55:23Z

@kghose You also might be interested in this blog post.

kghose · 2013-05-03T19:21:50Z

@cpcloud thanks for the links.

kghose closed this as completed May 3, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistent behavior of hierarchical indexes when indexes are of different data types #3521

Inconsistent behavior of hierarchical indexes when indexes are of different data types #3521

kghose commented May 3, 2013

cpcloud commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

Uh oh!

Inconsistent behavior of hierarchical indexes when indexes are of different data types #3521

Inconsistent behavior of hierarchical indexes when indexes are of different data types #3521

Comments

kghose commented May 3, 2013

cpcloud commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

cpcloud commented May 3, 2013

Uh oh!

kghose commented May 3, 2013

Uh oh!