ER/DOC: Sorting in multi-index columns: misleading error message, unclear docs

related #739

Have a look at this example:

``` python
import pandas as pd
import numpy as np
from StringIO import StringIO
print "Pandas version %s\n\n" % pd.__version__

data1 = """idx,metric
0,2.1
1,2.5
2,3"""

data2 = """idx,metric
0,2.7
1,2.2
2,2.8"""

df1 = pd.read_csv(StringIO(data1))
df2 = pd.read_csv(StringIO(data2))
concatenated = pd.concat([df1, df2], ignore_index=True)
merged = concatenated.groupby("idx").agg([np.mean, np.std])

print merged
print merged.sort('metric')
```

and its output:

```
$ python test.py 
Pandas version 0.11.0


     metric          
       mean       std
idx                  
0      2.40  0.424264
1      2.35  0.212132
2      2.90  0.141421
Traceback (most recent call last):
  File "test.py", line 22, in <module>
    print merged.sort('metric')
  File "/***/Python-2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 3098, in sort
    inplace=inplace)
  File "/***/Python-2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 3153, in sort_index
    % str(by))
ValueError: Cannot sort by duplicate column metric
```

The problem here is not that there is a duplicate column `metric` as stated by the error message. The problem is that there are still two sub-levels. The solution in this case is to use

``` python
merged.sort([('metric', 'mean')])
```

for sorting by the mean of the metric. It took myself quite a while to figure this out. First of all, the error message should be more clear in this case. Then, maybe I was too stupid, but I could not find the solution in the docs, but within a thread on StackOverflow. Looks like the error message above is the result of an over-generalized condition around https://github.com/pydata/pandas/blob/v0.12.0rc1/pandas/core/frame.py#L3269


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ER/DOC: Sorting in multi-index columns: misleading error message, unclear docs #4370

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ER/DOC: Sorting in multi-index columns: misleading error message, unclear docs #4370

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions