Skip to content

Pivot to SparseDataFrame: TypeError: ufunc 'isnan' not supported in sparse matrix conversion #11633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DSLituiev opened this issue Nov 18, 2015 · 7 comments
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Milestone

Comments

@DSLituiev
Copy link

I want to convert a DataFrame to SparseDataFrame before pivoting it (when it gets really sparse, see also this discussion ). I have a textual key, which I need to keep ("chr"):

df = pd.DataFrame( list(zip([3,2,4,1,5,3,2],
             ["chr1", "chr1", "chr1",  "chr1", "chr2", "chr2", "chr3"], 
            [100,100, 100, 200, 1,3,1],
            [True, True, True, False, True, False, True],
            [-1,0,1,3, 0,2,1])) ,
            columns = ["counts", "chr", "pos", "strand", "distance"])

df.iloc[:,1:].dtypes
Out[]: 
chr         object
pos          int64
strand        bool
distance     int64
dtype: object

For this small table it works well with regular DataFrame:

pd.pivot_table(df, index= [ "chr", "pos"], columns= ["strand","distance"], values= "counts").fillna(0)

     strand   False    True       
distance     2  3    -1  0  1
chr  pos                     
chr1 100     0  0     3  2  4
     200     0  1     0  0  0
chr2 1       0  0     0  5  0
     3       3  0     0  0  0
chr3 1       0  0     0  0  2

But I need to do it on much larger matrices. So I tried to do following trick:

dfpiv = pd.pivot_table(pd.SparseDataFrame(df), index= [ "chr", "pos"], columns= ["strand","distance"], values= "counts")

but I am getting:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Are there any plans to include a functionality option into pivot function for automatic conversion into SparseDataFrame?

@DSLituiev
Copy link
Author

If I include default_fill_value=0, which makes sense in my case I get yet another error:

>>> dfsp = pd.SparseDataFrame(df, default_fill_value=0)
ValueError: could not convert string to float: '<value from "chr" column>'

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

you would have to show a copy-pastable example. and pd.show_versions()

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Nov 18, 2015
@DSLituiev
Copy link
Author

please see updated post with an example above

@DSLituiev
Copy link
Author

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2.dev0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)

@jreback jreback added this to the Next Major Release milestone Dec 10, 2015
@jreback
Copy link
Contributor

jreback commented Dec 10, 2015

this is quite easy to fix, need to replace ~np.isnan(arr) with pd.notnull(arr)

pull-requests are welcome

@DSLituiev
Copy link
Author

Do you have a test file dedicated to sparse?

@jreback
Copy link
Contributor

jreback commented Dec 16, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
2 participants