DataFrame.to_records dtype shouldn't use unicode for every column

#### Code Sample, a copy-pastable example if possible

As of 0.20, `DataFrame.to_records` will use the `unicode` type for the all dtype identifiers on python 2.

```python
In [36]: pd.DataFrame({u'c/\u03c3': [1, 2], 'c/s': [3, 4]}).to_records()
Out[36]:
rec.array([(0, 3, 1), (1, 4, 2)],
          dtype=[(u'index', '<i8'), (u'c/s', '<i8'), (u'c/\u03c3', '<i8')])
```

This caused some issues for statsmodels, since they go `to_records().dtype` -> `np.dtype`, which doesn't like unicode identifiers on python2 (https://github.com/statsmodels/statsmodels/issues/3658#issuecomment-301471472)

I think the correct behavior is to just use whatever the user has. So the output from above should be

```python
In [36]: pd.DataFrame({u'c/\u03c3': [1, 2], 'c/s': [3, 4]}).to_records()
Out[36]:
rec.array([(0, 3, 1), (1, 4, 2)],
          dtype=[('index', '<i8'), ('c/s', '<i8'), (u'c/\u03c3', '<i8')])

```

so the python2 `str` column (which is actually bytes) should just be `'c/s'`, not `u'c/s'`.

This thing pandas has to decide is how to handle

1. the default `'index'` when df.index.name is None
2. non-string columns like numbers

I think the least-surprising there is to use `str()`, so on py2 that will be bytes, and on py3 it will be unicode. Not sure if it will cause problems elsewhere though.

xref https://github.com/pandas-dev/pandas/pull/13462 and https://github.com/pandas-dev/pandas/issues/11879

cc @AlexisMignon 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataFrame.to_records dtype shouldn't use unicode for every column #16358

Code Sample, a copy-pastable example if possible

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DataFrame.to_records dtype shouldn't use unicode for every column #16358

Description

Code Sample, a copy-pastable example if possible

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions