Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,7 @@ Sparse

- Bug in ``SparseSeries`` raises ``AttributeError`` when a dictionary is passed in as data (:issue:`16905`)
- Bug in :func:`SparseDataFrame.fillna` not filling all NaNs when frame was instantiated from SciPy sparse matrix (:issue:`16112`)
- Bug in :func:`make_sparse` treating two numeric/boolean data, which have same bits, as same when array ``dtype`` is ``object`` (:issue:`17574`)


Reshaping
Expand Down
11 changes: 10 additions & 1 deletion pandas/core/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from pandas.core.dtypes.common import (
_ensure_platform_int,
is_float, is_integer,
is_object_dtype,
is_integer_dtype,
is_bool_dtype,
is_list_like,
Expand Down Expand Up @@ -789,7 +790,15 @@ def make_sparse(arr, kind='block', fill_value=None):
if is_string_dtype(arr):
arr = arr.astype(object)

mask = arr != fill_value
if is_object_dtype(arr.dtype):
mask = np.ones(len(arr), dtype=np.bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok pls add a comment on what/why this is happening.

no general numpy routine to do this? (I don't think we have a pandas one), but have a look around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Added comments. I haven't found yet. I think there is no numpy general method.

fv_type = type(fill_value)

itr = (type(x) is fv_type for x in arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still quite cumbersome can you simplify

cond = np.fromiter(itr, dtype=np.bool)
mask[cond] = arr[cond] != fill_value
else:
mask = arr != fill_value

length = len(arr)
if length != mask.size:
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/sparse/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@ def test_constructor_object_dtype(self):
assert arr.dtype == np.object
assert arr.fill_value == 'A'

# GH 17574
data = [False, 0, 100.0, 0.0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issues number as a comment

arr = SparseArray(data, dtype=np.object, fill_value=False)
assert arr.dtype == np.object
assert arr.fill_value is False
arr_expected = np.array(data, dtype=np.object)
it = (type(x) == type(y) and x == y for x, y in zip(arr, arr_expected))
assert np.fromiter(it, dtype=np.bool).all()

def test_constructor_spindex_dtype(self):
arr = SparseArray(data=[1, 2], sparse_index=IntIndex(4, [1, 2]))
tm.assert_sp_array_equal(arr, SparseArray([np.nan, 1, 2, np.nan]))
Expand Down