-
Notifications
You must be signed in to change notification settings - Fork 77
add vectorised metadata, closes #1676 #1690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
06361aa
to
ccba416
Compare
For getting the nested keys how about something like: You'd need to check if key was a list first. |
Wow that is much more elegant than I would have done. |
Needs some more tests for weird metadata, but this seems to work. |
Ok, I think this is good to go? |
f56399c
to
c35c6d2
Compare
Codecov Report
@@ Coverage Diff @@
## main #1690 +/- ##
=======================================
Coverage 93.79% 93.79%
=======================================
Files 27 27
Lines 23578 23595 +17
Branches 1085 1089 +4
=======================================
+ Hits 22114 22130 +16
- Misses 1429 1430 +1
Partials 35 35
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I think we need to handle the default differently though.
python/tskit/tables.py
Outdated
def metadata_vector(self, key, *, dtype=None, default_value=None): | ||
""" | ||
Returns a numpy array of metadata values obtained by extracting ``key`` | ||
from each metadata entry, and inserting ``default_value`` if the key is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inserting" suggests that we might be modifying the metadata
from each metadata entry, and inserting ``default_value`` if the key is | |
from each metadata entry, and using ``default_value`` if the key is |
python/tskit/tables.py
Outdated
if isinstance(key, list): | ||
out = np.array( | ||
[ | ||
functools.reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use from functools import reduce
you save an attribute look up in the tight loop. Same with Mapping
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah! ok!
python/tskit/tables.py
Outdated
out = np.array( | ||
[ | ||
functools.reduce( | ||
lambda d, k: d.get(k, default_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want to use get
if a default value is supplied so that KeyError
is raised for a missing entry. As None
could be a valid desired default, you'll need a singleton class NO_DEFAULT
to check for and use as the default kwarg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh dear
python/tskit/tables.py
Outdated
self.ll_table.metadata_schema | ||
) | ||
|
||
def metadata_vector(self, key, *, dtype=None, default_value=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea but you could make key
optional and return the whole metadata object if it is not specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm - nice idea, but that might encourage bad habits? Currently, metadata has to be a dictionary (not eg a single number), so I'm not sure about this. We could always extend it later to cover this case without breaking this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Peter here
Thanks! I think I took care of those things. Too bad python doesn't have the missing() function. |
I haven't scanned the diffs, but curious if this works/is tested on both codec flavors? (JSON and struct) |
Whoops - I was too clever with the tests, and hadn't caught that it was doing totally the wrong thing. Since strings are a |
It certainly should work - this is only using things in the layer that the codec is invisible! |
d7e1f1b
to
4ec80ea
Compare
@petrelharp I've added a commit here with an extra test for deep |
# we override the meta so that it looks good in the docs. | ||
class NotSetMeta(type): | ||
def __repr__(cls): | ||
return "Not set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah-ha! nice magic!
3118b53
to
a44da9a
Compare
Rebased; should be ready to merge. |
a44da9a
to
35716ff
Compare
Added changelog - merging. |
Here is a start at adding
metadata_vector
. I haven't figured out the right way to get keys that are a few nested levels down, though - do you know an elegant way to do this, @benjeffery?Fixes #1676