Skip to content

read_csv index_col ignores dtype if specified #12999

Closed
@alzmcr

Description

@alzmcr

Hi, I'm not sure if this is intended but when using the index_col parameter in read_csv it ignore the input format specified in dtype. It's reproducible as following using pandas 0.18.0 and numpy 1.11.0.

>>> from StringIO import StringIO
import pandas as pd
df_csv = """request_hour,request_date,size
03,2016-04-26,2580954.0
04,2016-04-26,12003662.0
05,2016-04-26,13042624.0
06,2016-04-26,2899309.0
07,2016-04-26,-1.0"""

>>> pd.read_csv(StringIO(df_csv), dtype={'request_hour': 'string'}).set_index('request_hour')

             request_date        size
request_hour                         
03             2016-04-26   2580954.0
04             2016-04-26  12003662.0
05             2016-04-26  13042624.0
06             2016-04-26   2899309.0
07             2016-04-26        -1.0
# This is what I would expected as output

>>> pd.read_csv(StringIO(df_csv), dtype={'request_hour': 'string'}, index_col=0)

             request_date        size
request_hour                         
3              2016-04-26   2580954.0
4              2016-04-26  12003662.0
5              2016-04-26  13042624.0
6              2016-04-26   2899309.0
7              2016-04-26        -1.0
# I'm surprise that the index has been converted to int

I couldn't find any specs on this anywhere, so I wonder if something with the read_csv or I'm doing something wrong.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions