Skip to content

TST/API: test the list of NA values in the csv parser. add N/A, #NA as independent default values (GH5521) #5601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 27, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -564,7 +564,7 @@ the corresponding equivalent values will also imply a missing value (in this cas
``[5.0,5]`` are recognized as ``NaN``.

To completely override the default values that are recognized as missing, specify ``keep_default_na=False``.
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', 'NA',
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA',
'#NA', 'NULL', 'NaN', 'nan']``.

.. code-block:: python
Expand Down
4 changes: 3 additions & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ Improvements to existing features
(:issue:`4039`) with improved validation for all (:issue:`4039`,
:issue:`4794`)
- A Series of dtype ``timedelta64[ns]`` can now be divided/multiplied
by an integer series (:issue`4521`)
by an integer series (:issue:`4521`)
- A Series of dtype ``timedelta64[ns]`` can now be divided by another
``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
is frequency conversion; astyping is also supported.
Expand Down Expand Up @@ -409,6 +409,8 @@ API Changes

- raise/warn ``SettingWithCopyError/Warning`` exception/warning when setting of a
copy thru chained assignment is detected, settable via option ``mode.chained_assignment``
- test the list of ``NA`` values in the csv parser. add ``N/A``, ``#NA`` as independent default
na values (:issue:`5521`)

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,7 @@ def read_fwf(filepath_or_buffer, colspecs='infer', widths=None, **kwds):
# no longer excluding inf representations
# '1.#INF','-1.#INF', '1.#INF000000',
_NA_VALUES = set(['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN',
'#N/A N/A', 'NA', '#NA', 'NULL', 'NaN',
'#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN',
'nan', ''])


Expand Down
25 changes: 25 additions & 0 deletions pandas/io/tests/test_parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -683,6 +683,31 @@ def test_non_string_na_values(self):
tm.assert_frame_equal(result6,good_compare)
tm.assert_frame_equal(result7,good_compare)

def test_default_na_values(self):
_NA_VALUES = set(['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN',
'#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN',
'nan', ''])

nv = len(_NA_VALUES)
def f(i, v):
if i == 0:
buf = ''
elif i > 0:
buf = ''.join([','] * i)

buf = "{0}{1}".format(buf,v)

if i < nv-1:
buf = "{0}{1}".format(buf,''.join([','] * (nv-i-1)))

return buf

data = StringIO('\n'.join([ f(i, v) for i, v in enumerate(_NA_VALUES) ]))

expected = DataFrame(np.nan,columns=range(nv),index=range(nv))
df = self.read_csv(data, header=None)
tm.assert_frame_equal(df, expected)

def test_custom_na_values(self):
data = """A,B,C
ignore,this,row
Expand Down