DataFrame from hierarchical NumPy recarray with hierarchical MultiIndex results in all NaN values

I filed #13415 in which it was said that `DataFrame(recarray, columns=MultiIndex)` does reindexing and so only selects matching columns to be in the resultant frame.  I can see how this might be a backward compatibility constraint.  However, I have discovered a similar but different case which still seems broken:

```
arr = np.zeros(3, [('q', [('x',float), ('y',int)])])
ind = pd.MultiIndex.from_tuples([('q','x'),('q','y')])
pd.DataFrame(arr, columns=ind)
```

This creates a 3x2 array of zeros, but results in a 3x2 DataFrame of NaNs.  Note that the column names basically match: the NumPy array has a top-level `q` with subitems `x` and `y`, and so does the `MultiIndex`.  If the top-level name in the MultiIndex  is changed to something other than `q` it results in an empty DataFrame, meaning that there is some recognized correspondence between the input data and the requested columns.  But the data is lost nevertheless, putting NaNs where should be zeros.

Either the columns are considered non-matching, in which case the result should be an empty DataFrame, or they do match, in which case the result should be a DataFrame with contents from the input array.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataFrame from hierarchical NumPy recarray with hierarchical MultiIndex results in all NaN values #13421

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DataFrame from hierarchical NumPy recarray with hierarchical MultiIndex results in all NaN values #13421

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions