Skip to content

Add densification manager. #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hameerabbasi
Copy link
Collaborator

Fixes #10

A densification manager for COO.

You can do, for example,

with s.configure_densification(densify=True), s2.configure_densification(densify=True):
    x = np.exp(s)
    x2 = s + 1
    x3 = -(s + s2) + 1

And it should work automagically.

@hameerabbasi hameerabbasi changed the title Make densification manager and changes. Add densification manager. Jan 25, 2018
@hameerabbasi
Copy link
Collaborator Author

I think this is ready for code review before I get to the docs. See the tests for example use.

@nils-werner
Copy link
Contributor

nils-werner commented Jan 26, 2018

I find it weird that the ._densification_config attribute of an output array may changes from being a set set() to a DensificationConfig instance and back:

s = sparse.random((20, 30, 40))

with s.configure_densification(densify=True):
    x = s ** 2
    print(type(s._densification_config))  # DensificationConfig
    print(type(x._densification_config))  # set

print(type(s._densification_config))  # DensificationConfig
print(type(x._densification_config))  # set

with x.configure_densification(densify=True):
    print(type(s._densification_config))  # DensificationConfig
    print(type(x._densification_config))  # DensificationConfig, WTF?

print(type(s._densification_config))  # DensificationConfig
print(type(x._densification_config))  # set, WTF?!

Isn't it possible to make it a DensificationConfig instance for every case?

@nils-werner
Copy link
Contributor

Also, the DensificationConfig instance of each array is not disconnected from the parent once the context manager is left. This means that entering another configure_densification context of a parent, all children will be changed as well:

s = sparse.random((20, 30, 40))

with s.configure_densification(densify=True):
    x = s ** 2
    print(s._densification_config.densify)           # True
    print(list(x._densification_config)[0].densify)  # True

print(s._densification_config.densify)               # False
print(list(x._densification_config)[0].densify)      # False

with s.configure_densification(densify=True):
    print(s._densification_config.densify)           # True
    print(list(x._densification_config)[0].densify)  # True, this should be False!

print(s._densification_config.densify)               # False
print(list(x._densification_config)[0].densify)      # False

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Jan 26, 2018

Yes, both of these bugs are one and the same. I could, for example, just store a DensificationConfig once instead of storing the parents and this would resolve all of these issues.

I have, however, provided a COO.densification_config property that will always be the reduced DensificationConfig.

However, my question is: Do we actually want the children to be linked to the parent or not? Linked in the sense that once the parent's config changes, the child's does as well.

Another thing I was considering was automatic densification (above a certain density and below a certain size). This would help performance a lot.

@hameerabbasi
Copy link
Collaborator Author

Or I could store the children in the context manager and reduce the children's configs once the manager has exited.

return

if max_size is None:
max_size = 10000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly use the default value of max_size=10000 in the function signature?

if min_density is None:
min_density = 0.25

if not isinstance(densify, bool) and densify is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allowing densify to be of mixed type (None and bool), you could create a Maybe class that encapsulates the behaviour


def __str__(self):
if isinstance(self.densify, bool):
return '<DensificationConfig: densify=%s>' % self.densify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be one unified repr.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel otherwise... Printing fluff that isn't relevant to the repr can be useless. This just prints the required info.

raise ValueError('Invalid DensificationConfig.')

@staticmethod
def combine(*configs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between combine and from_many?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined them (pun intended).

Seriously, though. from_parents builds the parents, and reduces if none of the parents are in a context manager mode.

_reduce_from_parents gets the current rules without regards to what the parents are.


class DensificationConfig(object):
def __init__(self, densify=None, max_size=None, min_density=None):
if isinstance(densify, (Iterable, DensificationConfig)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how you abuse the densify parameter here. If you want a copy constructor, we should have one specifically for it.

sparse/coo.py Outdated
return COO(self.coords[:, nonzero], data_func[nonzero],
shape=self.shape,
has_duplicates=self.has_duplicates,
sorted=self.sorted)
sorted=self.sorted, densification_config=densification_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💄 all other parameters got their own line.

raise ValueError("Performing this operation would produce "
"a dense result: %s" % name)

def check(self, name, *arrays):
Copy link
Contributor

@nils-werner nils-werner Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Varargs like *arrays can become a huge PITA, I think people should pass an iterable instead.

Also, name is not necessarily needed for a check, so it could be made optional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is needed, we use it for the exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to change this, but it is a huge ease to simply do config.check(name, arr) or config.check(name, arr1, arr2) instead of putting everything into a list. If we need to pass a list, we can always do config.check(name, *arrays). Could you tell me why it's difficult to support?

sparse/coo.py Outdated
self._densification_config = old_densification_config

@property
def densification_config(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how COO knows how to deal with all the different DensificationConfig flavors. All of this should be entirely encapsulated in DensificationConfig.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@hameerabbasi
Copy link
Collaborator Author

Thanks for the great feedback, @nils-werner!

@hameerabbasi
Copy link
Collaborator Author

I feel I should be writing tests for the DensificationConfig class as well.

@mrocklin
Copy link
Contributor

I might be a bit late to this discussion but, are we sure that we want to do this? Automatic densification may not be a good idea long term. If possible it would be good to make this change easy to remove in the future without having to touch all of the codebase. When I see densification commands going into each of the small functions I become concerned. This also raises the bar for contributions from new developers.

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Jan 26, 2018

I'll work on that... I'm not entirely sure I need this either, at least for any of my use cases. The best I can do here is pass the parent COO objects (instead of DensificationConfig object), and handle it in the constructor. That way, if we need to undo it, all we need to to (on the COO side) is to remove the code from the constructor.

Edit: Also, I'm actively working on this.

@hameerabbasi
Copy link
Collaborator Author

Okay, I'm convinced that a densification manager is unwise, and that densification should always be explicit. I'll close this PR but let the branch stay in case someone else can build on my work.

@hameerabbasi hameerabbasi deleted the densification-manager branch June 6, 2024 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants