Skip to content

csv_reader with limited number of columns should should completely disregard the unused fields #8985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cordeiro opened this issue Dec 3, 2014 · 6 comments
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@cordeiro
Copy link

cordeiro commented Dec 3, 2014

xref #6710

I have a CSV whose lines may have 11 or 18 fields. I only need to read the first 6 fields, so I use "usecols=range(6)". Even with the limited number of columns, I get the exception:

ValueError: Expected 11 fields in line 776483, saw 18

The csv_reader should completely disregard the unused fields.

Small test case:

csv = '19,29,39\n'*2 + '10,20,30,40\n'
df = pd.read_csv(io.StringIO(csv), engine='python', header=None, usecols=list(range(3)))

It also affects the C engine.

Discussed at the users mailing list at https://groups.google.com/d/topic/pydata/vjhFpHtgnvw/discussion

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

this is like #6710. Usecols will only select from the valid columns. It is inferring that you have 3, so it is a bit contradictory here.

Soln is to use names (or names=range(3)) to just get those first 3

In [21]: pd.read_csv(io.StringIO(csv), header=None, names=range(4))
Out[21]: 
    0   1   2   3
0  19  29  39 NaN
1  19  29  39 NaN
2  10  20  30  40

not sure if this is a bug or not; i'll mark it same as the other one.

@cordeiro
Copy link
Author

cordeiro commented Dec 3, 2014

I don't agree that they are the same bug.

Bug #6710 is about the ability to infer whether the next lines will have more fields or not.

In this case, all rows have all fields of interest ( [0:6] ). The remaining of the line will not be used anyway and should not be considered at all.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

@cordeiro I said they are alike, that's why its a separate issue :)

@cordeiro
Copy link
Author

cordeiro commented Dec 3, 2014

Oups. :)

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

np. if you do have interested in looking at this would be appreciated.

@dxe4
Copy link
Contributor

dxe4 commented Dec 6, 2014

sounds easy ill try to make a pr today for this

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 5, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Jan 9, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 10, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 11, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants