-
-
Notifications
You must be signed in to change notification settings - Fork 132
Automatic conversion to dense arrays #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's a tough one... I'd probably avoid returning sparse or dense conditionally from the same routine, as that's a recipe for hard-to-catch bugs in corner cases. |
Yeah, last week I would have come out strongly in favor of type stability. As I've been running computations in practice my confidence has decreased somewhat here. I'm still not entirely in favor of optional conversions, but if I'm unable to optimize down some sparse operations it may become an interesting option. The saving grace of losing type stability is that this may be come less of an issue if the |
One issue that I see in the code is that
Whatever we choose, we should bake it into |
Two other things that would be helpful to think about are:
|
Paging @mrocklin @nils-werner @jakevdp. Your input would be very welcome. |
At the moment I'm not in favor of auto-densification by default. However I do agree that we should make it easy for users to do this on their own. One approach would be to expose something like maybe_densify to the user? Config options with a context manager also seem sensible to me. I don't have confidence here in any direction. |
Personally I'm inclined to ignore this issue until it comes up in a few more use cases. |
I wanted to implement |
If we currently do auto-densification then yes, I think it makes sense to remove it. An |
When researching how to do configurations with context managers the proper, thread-safe way, I came accross PEP 567. I'm surprised it doesn't already exist. When making our own implementation (which I'm somewhat skeptical about), the following problems come to mind:
All of this is handled for us by PEP 567. However, if we're not worried about thread safety we can just use a stack. |
Am I correct in assuming you talking about threadsafe context managers means you want to do something like
In that case I would seriously warn against that, for exactly the issues you are fearing. Plus, you would have to couple some singleton/module state to the behaviour of your This system of the module having an internal state is what gave us the horrible Matlab/Matplotlib principle where you need to track the state of
Only recently they started introducing (and encouraging) the right way of doing it: instead of the module having a state you get back instances which have states:
So instead, if possible, I would recommend a solution that keeps the
|
While this is a very good solution, I was thinking more of the use-case where
Then, even if we allow things like:
We would have to make sure that I'm not entirely against this approach, I think it's more elegant, but we would need a bit of processing at the densifying stage. Edit: Not to mention... Memory leaks if we actually store the parents in the child object... |
Another thing I was thinking was how to manage mixed sparse-dense operations. For example, if
In this case,
|
I got around this by storing the parent configs (just the configs, not the full objects) in each child object using I also found a way to handle the issues in my previous comment in #75. My implementation is available in #88. Feedback welcome. |
|
Hello, A design decision was made to disallow it (read the discussion over in #218). If you must have dense arrays, the best thing to do is to map blocks in Dask using |
@Hoeze Also, you can control this behavior using the environment variable |
@hameerabbasi Thank you very much, you just saved my day! |
Apologies if this is a dumb question but I am having the same issue as described in this thread and was wondering... Would anyone be able to tell me how you changed the |
By starting python using
or running
once Python is running. |
Ah! Sorry yes, I see! Thank you! |
Operations on sparse arrays sometimes produce dense arrays. This type instability can cause some frustration downstream but may be optimal performance-wise.
In many occasions we actually inherit this behavior from scipy.sparse, which returns
numpy.matrix
objects in some cases. Currently we also return densenumpy.ndarray
objects when this happens and when the number of non-zeros is high. I'm running into cases where I want to do this more and more, especially in parallel computing cases where I tensordot and add together many sparse arrays. Switching to dense starts to make a lot of sense.However this will likely cause some frustration downstream as users sometime receive sparse and and sometimes receive dense arrays based on their data at the moment. Is this performance gain worth it?
The text was updated successfully, but these errors were encountered: