Skip to content

BUG: read_csv doesn't respect dtype argument for index_col #20541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kylebarron opened this issue Mar 29, 2018 · 3 comments
Closed

BUG: read_csv doesn't respect dtype argument for index_col #20541

kylebarron opened this issue Mar 29, 2018 · 3 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves IO CSV read_csv, to_csv

Comments

@kylebarron
Copy link
Contributor

Code Sample

import pandas as pd
from io import StringIO
data = '"zip1","zip2","mi_to_zcta5"\n'
data += '"00601","00631",5.43229145138995\n'
data += '"00601","00641",6.19718605765618'

df1 = pd.read_csv(StringIO(data), header=0, dtype={'zip1': str, 'zip2': str})
df1.zip1.dtype.name
# Object

df2 = pd.read_csv(StringIO(data), header=0, index_col='zip1', dtype={'zip1': str, 'zip2': str})
df2.index.dtype.name
# int64

Problem description

The column being read as the index should respect the dtype provided in the dtype argument when the name provided with index_col is a key in the dtype dict.

I couldn't find another issue with this specific problem, but please correct me if there is.

Expected Output

df1 = pd.read_csv(StringIO(data), header=0, dtype={'zip1': str, 'zip2': str})
df1.set_index('zip1').index.dtype.name
# Object

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.18.7.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.4.2
pip: 9.0.3
setuptools: 38.6.0
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.9.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.0
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv Bug labels Mar 29, 2018
@gfyoung
Copy link
Member

gfyoung commented Mar 29, 2018

Good catch! PR to patch is welcome!

@jschendel
Copy link
Member

dupe of #9435

@kylebarron
Copy link
Contributor Author

Thanks @jschendel

@jreback jreback added this to the No action milestone Mar 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

4 participants