-
Notifications
You must be signed in to change notification settings - Fork 77
Add dict_of_dicts to tree sequence #1296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1296 +/- ##
=======================================
Coverage 93.81% 93.81%
=======================================
Files 26 26
Lines 22187 22198 +11
Branches 1006 1009 +3
=======================================
+ Hits 20814 20825 +11
Misses 1340 1340
Partials 33 33
Flags with carried forward coverage won't be shown. Click here to find out more.
|
fc393a4
to
9cf952c
Compare
9cf952c
to
935d052
Compare
Very neat! Looks very nice, but I think we want to be sure that this is the "right" graph representation before we make it part of the public API. Some questions:
d = ts.as_dict_of_dicts()
tsp = tskit.TreeSequence.from_dict_of_dicts(d)
assert tsp.equals(ts, ignore_provenance=True) We would need to have some visibilty on whether this is possible before we can fix on the graph representation. |
Nice! In the round trip example @jeromekelleher, do you mean that just the topology would round trip? Or other things like populations and migrations? (I guess these could be included as an attribute at the root dict?) |
Just the topology initially, we don't have to worry about the other stuff like individuals, migrations etc. But yes, in the case of populations it'd probably be simplest to add them in as attrs in the root dict. |
AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths. |
We would round-trip the dicts, not the NetworkX representation. |
Yes - I'm afraid we have to do the due diligence here @hyanwong. We need to spend time exploring the properties of this representation if we're doing to make it part of the API. The first question we need to answer is "is this a bijection"? |
I assumed this would simply be an analogue to Tree.as_dict_of_dicts, so purely a (non complete) export format for NetworkX. If you want it to be something more, then it gets complex, I agree. |
We don't have a Tree.from_dod method, so I'm not sure we need (want?) a TS one? |
Apologies for the terse replies: I'm on a mobile. There's no reason to merge this any time soon. As it is, I need to use a slightly bespoke version anyway, which randomises the node IDs (otherwise locating isomorphic nodes is biased towards the input order). I just imagine that if I find a function like this useful for analysing the tree sequence as a graph, others might too. So it's probably worth having a similar function in the API. But as you say, worth doing the due diligence first. I'm in no hurry either way. |
OK - since we're making quite a big decision here (what's the graph representation of a tree sequence) @hyanwong, I think it would be better if you used the version you have for your own work, and we took our time with this. The Tree version is much simpler and we don't worry about round tripping it because there's no way of creating a Tree on it's own anyway. The TreeSequence is the core data structure. So, I think we can mark this as a draft? |
Done. FWIW I don't think |
Could we update this method to return the representation I'm advocating for here @hyanwong? #2068 (reply in thread) I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step. |
I haven't figured out how to use the "dict_of_dicts" version to allocate attributes (like time) to nodes. So maybe this isn't the right way anyway?
Hmm, is that true. Isn't it just as easy to go from multigraph -> graph as the other way round? |
Tidying up old PRs - please re-open if you wish. |
Description
Fixes #1294. Should probably add some tests for attributes of the graph like
branch_length
,left
, andright
.PR Checklist: