-
Notifications
You must be signed in to change notification settings - Fork 77
Make metadata lazy for table row classes #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1261 +/- ##
==========================================
+ Coverage 93.78% 93.82% +0.03%
==========================================
Files 26 26
Lines 21991 21995 +4
Branches 987 991 +4
==========================================
+ Hits 20625 20637 +12
+ Misses 1325 1324 -1
+ Partials 41 34 -7
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, see some comments.
python/tskit/tables.py
Outdated
return cls | ||
|
||
|
||
@lazy_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not obvious here why we need both a decorator and to inherit from LazyMetadata - one or other is easy to follow for a reader, but both is confusing. What's the rationale here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should spell it out up here where these classes are defined anyway, need to encode this knowledge somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how to make an extra slot with a class decorator, annoying!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we bother with the decorator then? We should be able to implement this in a straighforward inheritancy type way, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manged to do it all in a decorator in the end, can't do inheritance due to intercepting the constructor.
@jeromekelleher I think this could have a review now. Docs are not building due to a warning I haven't figured out yet, but look good ignoring the warning. Note that I've removed the deprecated attrs on mutation - these could be added back if you think it is too early. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I'm seriously questioning whether making things frozen is worth it. What problem are we trying to solve here?
Having to resort to this __setattr__
stuff to do perfectly reasonable things seems much more likely to result in errors than anything the frozen is trying to prevent. Plus, I think we are likely to break code if we change the semantics to frozen now.
If we're trying to allow for things like:
tables.nodes[0].flags = 0
then we perhaps we could have table rows frozen for now (to allow this in future) but keep the stuff returned by the TreeSequence unfrozen?
python/tests/simplify.py
Outdated
@@ -218,7 +219,7 @@ def record_edge(self, left, right, parent, child): | |||
else: | |||
last = self.edge_buffer[child][-1] | |||
if last.right == left: | |||
last.right = right | |||
builtins.object.__setattr__(last, "right", right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a pattern I want to encourage people to use. This seems worse than not having Frozen instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed - I could rejig these instances to not use the ugly way. It only really needs to be used in the custom __init__
and the metadata decorator.
python/tskit/tables.py
Outdated
and np.array_equal(self.location, other.location) | ||
and np.array_equal(self.parents, other.parents) | ||
and self.metadata == other.metadata | ||
) | ||
|
||
def __neq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things here:
- I think we can get rid of
__neq__
now, this is handled automatically in more recent versions of Python - Can we use the dataclass
__eq__
? If not, let's add a comment saying why, as this will be a recurring question.
(Ah, answering my own question here: we have numpy arrays. Good to add a quick comment here to that effect)
python/tskit/trees.py
Outdated
""" | ||
metadata: Any | ||
""" | ||
The decoded :ref:`metadata <sec_metadata_definition>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, only decoded if a schema is present, right?
python/tskit/trees.py
Outdated
and np.array_equal(self.mutations, other.mutations) | ||
) | ||
|
||
def __neq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need neq I think
python/tskit/trees.py
Outdated
@@ -560,71 +488,95 @@ class Variant(SimpleContainer): | |||
Modifying the attributes in this class will have **no effect** on the | |||
underlying tree sequence data. | |||
|
|||
:ivar site: The site object for this variant. | |||
:ivar site: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of the ivars entirely I think (hope?)
python/tskit/metadata.py
Outdated
@@ -39,6 +41,8 @@ | |||
import tskit | |||
import tskit.exceptions as exceptions | |||
|
|||
__setattr__ = builtins.object.__setattr__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the wisdom of this - don't we then export this into our namespace? If we're going to do something this ugly, the I'd prefer to feel the full brunt of the ugliness each time we use it, just to be reminded of how ugly it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imported it to avoid the double lookup on each row access - I thinking importing it to something called __builtin__object__setattr__
would be ugly enough?
When were they marked as deprecated? We should update the changelog with details. |
The idea was to prevent users thinking they could mutate the classes - as it warned about in the old docs.
Do you mean the other way round? As in |
No - to me the only point in freezing the classes is so that we don't break people's code later when we change the semantics to "push the changes back into the tables". So, let people mutate away on the Node, Edge etc classes - we document that these are copies of the original data, and they can do whatever they like with them. Freezing the classes just for the sake of it, or to somehow prevent people from writing bad code seems ... unpythonic to me. |
How about we make the minimal changes to dataclasses first as cleanly as we can with the same semantics as we currently have (ish), and then look at whether we should freeze or not as a second pass? |
3a93746
to
afc0ec7
Compare
@jeromekelleher OK, semantics are now exactly as they were before. Think I have addressed the comments too. Just the docs |
afc0ec7
to
020f2d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, seems like a major improvement. One comment on the type annotations.
@@ -2991,17 +2971,18 @@ def test_metadata(self): | |||
(inst,) = self.get_instances(1) | |||
(inst2,) = self.get_instances(1) | |||
assert inst == inst2 | |||
inst._metadata_decoder = lambda m: "different decoder" | |||
inst.metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this statement do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was forcing decode of the metadata, but that is done by the equality operator above anyway, removed.
class IndividualTableRow: | ||
__slots__ = ["flags", "location", "parents", "metadata"] | ||
flags: int | ||
location: np.ndarray | ||
parents: np.ndarray | ||
metadata: bytes | ||
metadata: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could say metadata : Union[bytes, dict]
couldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Optional[Union[bytes, dict]]
We were decoding metadata on every table row access previously, this makes that lazy. This same method will be applied to the container classes that get returned from e.g.
ts.node(u)
when they are dataclasses.Note this also gives table row objects a
.table
attribute (@hyanwong might be interested) This attribute does not get put into the result ofdataclasses.asdict
or used in the auto dataclasses methods such as__eq__
or__repr__
. These auto methods also only see decoded metadata. The dataclasses are still "frozen".Needs a couple of extra tests and and one optimisation for the null schema case before merging. Also thinking the schema reference should be stored, as otherwise it could be modified before use by the row.