-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame.to_csv(quoting=csv.QUOTE_NONNUMERIC) quotes numeric values #12922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is probably only minimally tested now. The data is written thru the csv writer which gets passed the quoting. I think that needs to be turned off as we do all quoting formatting before passing it to the writer (maybe not ALL, and that's the rub, some cases maybe relying on the csv writer actually quoting things). |
Float values were being quoted despite the quoting spec. Bug traced to the float formatting that was unconditionally casting all floats to string. Unconditional casting traced back to commit 2d51b33 (pandas-devgh-12194) via bisection. This commit undoes some of those changes to rectify the behaviour. Closes pandas-devgh-12922. [ci skip]
This problem is still occurring if you are using a float format, i.e.:
Result:
Edit: It also appears to be doing the same to NaN values even without the float_format. |
@blitzd the issue here is the second you apply a float format it is now a string. So this is correct. That said I think we could document this. Can you open new issue for that. |
@jreback I can see that being the case for the format with a format string that would make it non-numeric. I would argue that 1.00 is still a numeric value though. Also - any thoughts on the NaN bit? That occurs regardless of the float_format. How hard would it be to have it where you could explicitly define the columns to be quoted? Or is this already possible? I will add a new issue with reference. Edit: Re: 'how hard would it be', a bit of a hackish method but it works for me:
|
Failing test
The issue is that the floats are being output wrapped with quotes, even though I requested QUOTE_NONNUMERIC.
The problem is that
pandas.core.internals.FloatBlock.to_native_types
(and by extensionpandas.formats.format.FloatArrayFormatter.get_result_as_array
) unconditionally formats the float array to a str array, which is then passed unchanged to thecsv
module and hence will be wrapped in quotes by that code.I'm not 100% sure but the fix may be to have
FloatBlock.to_native_types
check if quoting is set, and if so to skip using theFloatArrayFormatter
? I say this becausepandas.indexes.base.Index._format_native_types
already has a special case along these lines. This does seem a bit dirty though!Here is an awful monkeypatch that works around the problem:
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: