-
-
Notifications
You must be signed in to change notification settings - Fork 132
Remove auto densification and unify operator code. #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove auto densification and unify operator code. #46
Conversation
sparse/core.py
Outdated
assert isinstance(other, COO) | ||
if not isinstance(other, COO): | ||
raise ValueError("Performing this operation would produce " | ||
"a dense result: %s" % str(func)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this isn't always true though. For example x > 5
or x + 0
(which comes up when using the sum
function) are both common case situations and should both work efficiently. Is there some way that we can test against zero here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already being checked in _perform_op
. That one handles operations with scalars pretty nicely, and is general. I've also added a test for it (test_elemwise_scalar
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I can move it in here and refactor if you like.
Edit: Actually, I'll do that right now. Hold off on merging.
sparse/tests/test_core.py
Outdated
|
||
xs = COO.from_numpy(x) | ||
|
||
assert_eq(func(xs, y), func(x, y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check here that the output is a COO
object and that the number of non-zeros hasn't significantly increased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't increase, but it can decrease. Let me add that check.
@@ -1067,7 +1071,7 @@ def __abs__(self): | |||
|
|||
def exp(self, out=None): | |||
assert out is None | |||
return np.exp(self.maybe_densify()) | |||
return self.elemwise(np.exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always err
(given current code)
(not a concern, just asking a question)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But given what we talked about in #10 it's probably best to keep it.
Edit: If you really need exp
you can do np.exp(x.maybe_densify())
or np.exp(x.todense())
.
sparse/core.py
Outdated
@@ -641,11 +650,13 @@ def elemwise(self, func, *args, **kwargs): | |||
sorted=self.sorted) | |||
|
|||
def elemwise_binary(self, func, other, *args, **kwargs): | |||
assert isinstance(other, COO) | |||
if not isinstance(other, COO): | |||
return self.elemwise(func, other, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change. I think that this is nicer, especially given that elemwise_binary
is public API. People will probably try to use it and your change here will, I suspect, include fewer surprises for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might at some point reduce this down to a single public elemwise
function that handles both cases and dispatches to _elemwise_unary
and _elemwise_binary
based on inputs. (not necessarily now though if you'd like to wrap this up quickly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, properly > quickly. It saves headache down the line.
The problem I see with that is, elemwise
already exists and does what you are describing as _elemwise_unary
. I have no problem dispatching between the two, but my question is:
- Is it worthwhile changing the API? Changing what
elemwise
is and hidingelemwise_binary
? Do we have someone for whom it'll break? - I was also considering supporting
scipy.sparse.spmatrix
. It's a small change, and well worth it, I expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will also make implementing __array_ufunc__
much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new elemwise
would, I think, cover the functionality of the old elemwise
. It would now also work if the args weren't scalars, but were other things as well.
In general though this project is young enough and obscure enough that I think API breaks are fine. Anyone using this project is an early adopter.
I agree that supporting scipy.sparse.spmatrix
objects would be valuable.
sparse/core.py
Outdated
|
||
def elemwise(self, func, *args, **kwargs): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed your opinion here. Should we keep it this way and assume other=args[0]
or should we add an other=None
argument for explicitly selecting between binary and unary functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first is more compact but the second can help when, for example, the first input to your function itself is supposed to be COO
or spmatrix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without thinking about this too much I'm inclined to leave this as def elemwise(self, func, *args, **kwargs)
. I would expect len(args)
to determine between elemwise_unary and elemwise_binary as you've done below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the second case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose you have an inherently unary function that accepts COO
or spmatrix
as its second positional argument, instead of it being an operand to a binary function. I agree, it's a corner use case, but it is more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you think this won't ever be needed (and I mostly agree), this should be good to merge, if you haven't got any other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about this for now. Short term I might encourage users to use kwrags for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I overlooked that kwargs can map onto normal args in Python!
return self._elemwise_binary(func, *args, **kwargs) | ||
elif isinstance(other, scipy.sparse.spmatrix): | ||
other = COO.from_scipy_sparse(other) | ||
return self._elemwise_binary(func, other, *args[1:], **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch could use a test
sparse/core.py
Outdated
function. Otherwise, it will be treated as a unary function. | ||
kwargs : dict, optional | ||
The kwargs to pass to the function. | ||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a newline above this docstring title for clean rendering with sphinx
This looks great to me. Thanks @hameerabbasi ! I've added you as a collaborator to this fork. You should now have the ability to merge PRs. In general I tend to use "Squash and Merge" when merging PRs unless particular care has been taken around the commit history and the individual commits have significant value. Care to give this a try and merge your own PR here? |
Thanks so much for adding me as a collaborator! Would you prefer I still worked on my own fork in different branches or can I also work with branches here? |
Heh, you're doing more work on this than anyone else :)
I tend to prefer keeping development work on different branches when possible. I plan to continue working from my fork if/when we move this to a more public org. |
I've removed the auto-densification in
__add__
andexp
and I've removed the tests for it. In addition, I've unified the code for all operators so that they only work when not densified. In addition, they work with scalars if the scalars don't densify them.