Closed
Description
Hi, I'm not sure if this is intended but when using the index_col parameter in read_csv it ignore the input format specified in dtype. It's reproducible as following using pandas 0.18.0 and numpy 1.11.0.
>>> from StringIO import StringIO
import pandas as pd
df_csv = """request_hour,request_date,size
03,2016-04-26,2580954.0
04,2016-04-26,12003662.0
05,2016-04-26,13042624.0
06,2016-04-26,2899309.0
07,2016-04-26,-1.0"""
>>> pd.read_csv(StringIO(df_csv), dtype={'request_hour': 'string'}).set_index('request_hour')
request_date size
request_hour
03 2016-04-26 2580954.0
04 2016-04-26 12003662.0
05 2016-04-26 13042624.0
06 2016-04-26 2899309.0
07 2016-04-26 -1.0
# This is what I would expected as output
>>> pd.read_csv(StringIO(df_csv), dtype={'request_hour': 'string'}, index_col=0)
request_date size
request_hour
3 2016-04-26 2580954.0
4 2016-04-26 12003662.0
5 2016-04-26 13042624.0
6 2016-04-26 2899309.0
7 2016-04-26 -1.0
# I'm surprise that the index has been converted to int
I couldn't find any specs on this anywhere, so I wonder if something with the read_csv or I'm doing something wrong.
Thanks!