Skip to content

DataFrame display fails after .loc in-place assignment for Int64 #1391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ianozsvald opened this issue Apr 23, 2020 · 3 comments
Closed

DataFrame display fails after .loc in-place assignment for Int64 #1391

ianozsvald opened this issue Apr 23, 2020 · 3 comments
Labels
Blocked ❌ A pull request that is blocked bug 🦗 Something isn't working
Milestone

Comments

@ianozsvald
Copy link

System information

  • Linux, Mint 19.3, 64 bit:
  • Modin version (modin.__version__): 0.7.2
  • Python version: 3.7.6
Using watermark:
CPython 3.7.6
IPython 7.13.0

pandas 1.0.1
modin 0.7.2
ray 0.8.0
dask 2.14.0
numexpr not installed

compiler   : GCC 7.3.0
system     : Linux
release    : 5.3.0-46-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 8
interpreter: 64bit

Describe the problem

Creating a simple dataframe in Modin, converting 1 column to Int64 (nullable integer), using loc to do an assignment of NaN, then displaying the dataframe fails.

Source code / logs

import os
os.environ["MODIN_ENGINE"] = "ray"  # Modin will use Ray
import modin.pandas as pd_md

import numpy as np
dfx = pd_md.DataFrame({'a': np.ones(10)})
dfx['a_I'] = dfx['a'].astype('Int64')
#dfx.loc[0, 'a_I'] = np.nan
dfx


a | a_I
-- | --
1.0 | 1 # dataframe as expected
1.0 | 1
...

Assigning a NaN value causes a failure:

import numpy as np
dfx = pd_md.DataFrame({'a': np.ones(10)})
dfx['a_I'] = dfx['a'].astype('Int64')
dfx.loc[0, 'a_I'] = np.nan
dfx
RayTaskError(TypeError)                   Traceback (most recent call last)
RayTaskError(TypeError): ray_worker (pid=24106, ip=192.168.0.129)
...
    raise TypeError("values must be a 1D list-like")
TypeError: values must be a 1D list-like

Initially I was calling info() and getting the Ray error, it looks like it is related to something more fundamental.

If we use Pandas then this works:

import numpy as np
import pandas as pd

dfx = pd.DataFrame({'a': np.ones(10)})
dfx['a_I'] = dfx['a'].astype('Int64')
dfx.loc[0, 'a_I'] = np.nan
dfx

a | a_I
-- | --
1.0 | <NA>
1.0 | 1
1.0 | 1
...

If I make a list with a NaN and then convert that to Int64 then this works:

dfx = pd_md.DataFrame({'a': [1, 2, 3, np.NaN]})
dfx['a'] = dfx['a'].astype('Int64')
#dfx.loc[0, 'a_I'] = np.nan 
#dfx.info() # this works too
dfx
a
--
1
2
3
<NA>

If I introduce dfx.loc[0, 'a_I'] = np.nan in the above example then it works fine.

@ianozsvald ianozsvald added the bug 🦗 Something isn't working label Apr 23, 2020
@devin-petersohn
Copy link
Collaborator

Thanks for the report @ianozsvald! I can reproduce this on current master. It looks like an issue with assignment in general from loc, because df.loc[0, 'a_I'] = 2 is also not working.

We will get this fixed, thanks again for the report!

@devin-petersohn devin-petersohn added this to the 0.7.3 milestone Apr 23, 2020
@devin-petersohn
Copy link
Collaborator

@ianozsvald I narrowed this down to a bug in pandas, it is some strange edge case. I opened an issue in pandas for this: pandas-dev/pandas#33828

We can try to work around this, but it will have to wait until next release. Thanks again for the report and great find!

@devin-petersohn devin-petersohn modified the milestones: 0.7.3, 0.7.4 Apr 27, 2020
@devin-petersohn devin-petersohn modified the milestones: 0.8.0, 0.8.1 Jul 29, 2020
@anmyachev anmyachev modified the milestones: 0.8.1, 0.8.2 Oct 14, 2020
@anmyachev anmyachev modified the milestones: 0.8.2, Someday Feb 9, 2021
@mvashishtha mvashishtha added the Blocked ❌ A pull request that is blocked label Jun 3, 2022
@mvashishtha
Copy link
Collaborator

Original reproducer is working at c9fc326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked ❌ A pull request that is blocked bug 🦗 Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants