Skip to content

BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665

Closed
@dmlogv

Description

@dmlogv

Problem description

DataFrame.to_dict() method do not cast Nullable Int types (Int*Dtype) into Python int type. Instead, it unwrapping into numpy.int* types.

Possibly related to: #27616, #25969, #21256

Expected Output

Native Python int type.

Reproduction

Make some data:

import pandas as pd

df = pd.DataFrame({'id': range(5),
                   'coeff': [i * 0.1 for i in range(5)],
                   'is_hot': [True] * 2 + [False] * 3,
                   'value': [1, None, 2, 3, None]})
df
id coeff is_hot value
0 0 0.0 True 1.0
1 1 0.1 True NaN
2 2 0.2 False 2.0
3 3 0.3 False 3.0
4 4 0.4 False NaN
df.dtypes

id          int64
coeff     float64
is_hot       bool
value     float64
dtype: object

value have to be a nullable int:

df['value'] = df['value'].astype(pd.Int64Dtype())
df.dtypes

id          int64
coeff     float64
is_hot       bool
value       Int64
dtype: object

Looks great. But convert a dataframe to dict:

dicts = df.to_dict(orient='records')
dicts

[{'id': 0, 'coeff': 0.0, 'is_hot': True, 'value': 1},
 {'id': 1, 'coeff': 0.1, 'is_hot': True, 'value': nan},
 {'id': 2, 'coeff': 0.2, 'is_hot': False, 'value': 2},
 {'id': 3, 'coeff': 0.30000000000000004, 'is_hot': False, 'value': 3},
 {'id': 4, 'coeff': 0.4, 'is_hot': False, 'value': nan}]
pd.DataFrame(
    [[type(v) for k, v in row.items()] for row in dicts], 
    columns=dicts[0].keys())
id coeff is_hot value
0 <class 'int'> <class 'float'> <class 'bool'> <class 'numpy.int64'>
1 <class 'int'> <class 'float'> <class 'bool'> <class 'float'>
2 <class 'int'> <class 'float'> <class 'bool'> <class 'numpy.int64'>
3 <class 'int'> <class 'float'> <class 'bool'> <class 'numpy.int64'>
4 <class 'int'> <class 'float'> <class 'bool'> <class 'float'>

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.7.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.4.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : en_US.UTF-8

pandas           : 1.0.4
numpy            : 1.18.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.15.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.2.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions