Skip to content

Deal with np.array(sparsearr) densification #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/generated/sparse.COO.__array__.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
COO\.\_\_array\_\_
==================

.. currentmodule:: sparse

.. automethod:: COO.__array__
1 change: 1 addition & 0 deletions docs/generated/sparse.COO.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ COO
COO.to_scipy_sparse
COO.tocsc
COO.tocsr
COO.__array__
Copy link
Collaborator

@hameerabbasi hameerabbasi Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want to document double-underscore functions in the API docs? I'm not sure what the convention is for the Python community. I know documenting them in code is good for potential contributors but I'm not sure if we should put them in the API docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy does document it

And I think in case of __array__ we should, because it is likely to show unexpected behaviour that users should be able to look up.


.. rubric:: :doc:`Other operations <../user_manual/operations/other>`
.. autosummary::
Expand Down
31 changes: 29 additions & 2 deletions sparse/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,31 @@ def nbytes(self):
"""
return self.data.nbytes + self.coords.nbytes

def __array__(self, dtype=None):
"""
Helper function that gets called during :code:`np.array(x)` conversion.
We deliberately return :code:`NotImplemented` to prevent accidental
densification.

Parameters
----------
dtype: type
Datatype requested by :code:`np.array(x)`. Has no effect on
output.

Returns
-------
NotImplemented
We do not implement this function, so we return
:obj:`NotImplemented`.

See Also
--------
numpy.ndarray.__array__ : Numpy equivalent function.

"""
return NotImplemented

def __len__(self):
"""
Get "length" of array, which is by definition the size of the first
Expand Down Expand Up @@ -2374,15 +2399,17 @@ def maybe_densify(self, allowed_nnz=1000, allowed_fraction=0.25):
>>> s.maybe_densify(allowed_nnz=5, allowed_fraction=0.25)
Traceback (most recent call last):
...
NotImplementedError: Operation would require converting large sparse array to dense
NotImplementedError: Operation would require converting large sparse \
array to dense. Use .todense() to force densification.
"""
elements = np.prod(self.shape)

if elements <= allowed_nnz or self.nnz >= elements * allowed_fraction:
return self.todense()
else:
raise NotImplementedError("Operation would require converting "
"large sparse array to dense")
"large sparse array to dense. Use "
".todense() to force densification.")


def tensordot(a, b, axes=2):
Expand Down
13 changes: 13 additions & 0 deletions sparse/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -887,3 +887,16 @@ def test_scalar_shape_construction():
def test_len():
s = sparse.random((20, 30, 40))
assert len(s) == 20


def test_numpy_todense():
# make sure that .astype(int) is never all zeros
s = sparse.random((20, 30, 40)) * 100
with pytest.raises(ValueError):
assert np.array(s)

with pytest.raises(ValueError):
assert np.array(s, dtype=int)

with pytest.raises(ValueError):
assert np.allclose(s, s.todense())