-
-
Notifications
You must be signed in to change notification settings - Fork 132
Inconsistent elemwise auto-densify rules #460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The reasoning behind this one was that doing One option is to allow "broadcasting" fill-values. |
But wouldn't this still blow up memory if both these arrays are dense? I'm not sure I would agree that it's this libraries job to error here, and I'm not aware of sparse matrix library that refuses this kind of operation (examples being
Personally, I'm not completley convinced allowing custom fill values is worth the complexity. It definitely has some cool applications, but to me the main goal is having working sparse arrays with an array interface. I think this solution makes a lot of sense for a lazy interface, but is just a lot of complexity for an eager one. |
I'll have to rephrase this one: |
This is also in line with the decision made over at #218. |
This isn't strictly true if the output array is dense, which would follow the current behaviour of But also, isn't the same general principle of large memory growth similar if both those arrays were sparse?
As a counter example here:
To my reading, the short discussion there is more focussed on the behavior of That said, it would be really nice if there was a universal way to convert an numpy-array-like to a numpy-array, which I thought was the intent of |
Currently we do not accept dense outputs.
This is true, I hadn't considered that.
I was assuming a fill-value of 0 in this case, and this case is, indeed, handled more generally such that there aren't blowups. I do agree it could be tested better.
Not quite, there's a difference between "this is already something like an array, please convert it to one" and "this may cause a performance slowdown in some cases" (like CuPy, which disallows |
This doesn't seem to be the case to me. The example at the top of the issue (adding a sparse array and a dense array resulting in dense array) works on current master. Line 459 in 0d50c9d
Lines 555 to 561 in 0d50c9d
I think this is a different point. My point here was it be nice if my library only worked on Your NEP-37 would help solve a number of these problems. |
What I meant by this is that we do not support
While I agree, it's relevant. But in light of the point that by my definition, memory blowup could also happen on |
Just to be sure we're on the same page here, does something like this fit what you mean? import numpy as np
import sparse
sparse_mtx = sparse.random((M, N))
dense_mtx = np.random.rand((M, N))
sparse_col = sparse_mtx[:, [0]]
sparse_row = sparse_mtx[0, :]
dense_col = dense_mtx[:, [0]]
dense_row = dense_mtx[0, :]
np.add(sparse_mtx, dense_mtx) -> np.ndarray
np.multiply(sparse_mtx, dense_mtx) -> SparseArray
np.add(dense_mtx, sparse_col) -> np.ndarray
np.add(sparse_mtx, dense_col) -> np.ndarray
np.add(sparse_mtx, sparse_col) -> SparseArray
np.mul(sparse_mtx, dense_col) -> SparseArray
np.mul(dense_mtx, sparse_col) -> ??? I think this might take a significant refactor of the I would also like to investigate why it seems most other libraries seem to have gone for the consistency of just having these operations always return sparse arrays. |
For a fill-value of |
I think this may be too inefficient of a way to find out that Lines 531 to 543 in e036c25
I think it would make more sense to just say that "if the operation is multiplication, we only need too look at the intersection of the sparse indices". That is, I think it makes sense to have this logic written as code as opposed to determined dynamically. |
While I agree, that would be a big refactor indeed. What I would prefer instead is that we defer that to the rewrite that's taking place in collaboration with the TACO team. |
I hadn't realized this was something that was being worked on. Is there somewhere it can be tracked? This definitely seems like the ideal, to just have all operations here generated by taco. I recall reading issues about this, but hadn't seen much movement on implementation. If this is something that might be a reality soon, then I would probably want to re-evaluate implementation priorities for |
Mostly in the TACO repository. They're adding an array API instead of just Einsum-type operations. Advanced indexing might have to be done on top. The work is expected to complete in June, after which we'll need to wrap it into something a bit more Pythonic. |
Great! I've been wanting to able to use taco in a package for a few years now, so very happy to see progress being made. Could you share any info on the plan for distribution? Is this going to require a C++ compiler at runtime, or was the idea of emitting numba code still being looked at? It would be nice to be able to use custom semirings or broadcast operations.
Yeah, so where does that leave this package, if most everything will just be implemented in |
Well the plan is to have either |
Also I should say -- The Array API work on TACO's side is expected in June, the bindings come after. Sorry for the ambiguity there.
The tentative plan is to statically link against LLVM so that neither LLVM nor a C++ compiler are required on the user's system. We haven't figured out a plan for custom operations from within Python, but I believe @guilhermeleobas had some ideas. |
I had understood that issue as being a tentative renaming. I might be misunderstanding what you're saying here, is it the plan to just deprecate this package? My main concern here is: is there value in adding features/ fixing bugs in this package? It seems like much of it (at least for COO, GCXS, and CSR/CSC classes) would be replaced by pytaco bindings.
Is there any sort of ETA for this? This seems like the largest hurdle for being able to have TACO as a dependency. Also, are releases documented somewhere? I was looking to install it but was unsure how to choose a stable commit to build at. |
I'm open to both a rename, as well as simply replacing this package. The issue I see is that there will be (albeight slight) backcompat breaks, including NaN-accuracy. We could (instead of a rename) push out a version 2.0.
Unfortunately it's dependent on too many things to calculate a reasonable ETA, some of which are behind the scenes and may become clearer within a few months.
TACO doesn't have a stable release yet, they're aiming for one soon without the Array API. |
When possible, we would like to statically link with LLVM. And about Numba, the initial idea was to reimplement TACO on top of Numba, but this is a lot of work, and we will not follow this direction anymore. In the future, we might consider Numba to generate user defined operations and link it with the LLVM generated by TACO. |
Describe the bug
Sometimes element wise operations return a dense result, sometimes they error. It seems to depend on the broadcasting.
To Reproduce
Expected behavior
Consistency. I'd like a dense output for this operation (following #10 (comment)), but would be happy with the consistency of:
System
sparse
version (sparse.__version__
) 0.12.0+8.g75125cdnp.__version__
) 1.20.2numba.__version__
) 0.53.1The text was updated successfully, but these errors were encountered: