Skip to content

Add dict_of_dicts to tree sequence #1296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

hyanwong
Copy link
Member

@hyanwong hyanwong commented Apr 2, 2021

Description

Fixes #1294. Should probably add some tests for attributes of the graph like branch_length, left, and right.

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@hyanwong hyanwong marked this pull request as draft April 2, 2021 10:26
@codecov
Copy link

codecov bot commented Apr 2, 2021

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.81%. Comparing base (f277006) to head (935d052).
Report is 1140 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1296   +/-   ##
=======================================
  Coverage   93.81%   93.81%           
=======================================
  Files          26       26           
  Lines       22187    22198   +11     
  Branches     1006     1009    +3     
=======================================
+ Hits        20814    20825   +11     
  Misses       1340     1340           
  Partials       33       33           
Flag Coverage Δ
c-tests 92.44% <ø> (ø)
lwt-tests 92.97% <ø> (ø)
python-c-tests 95.15% <100.00%> (+<0.01%) ⬆️
python-tests 98.86% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
python/tskit/trees.py 97.91% <100.00%> (+0.01%) ⬆️

@hyanwong hyanwong force-pushed the ts-dict-of-dicts branch 3 times, most recently from fc393a4 to 9cf952c Compare April 2, 2021 18:20
@hyanwong hyanwong marked this pull request as ready for review April 2, 2021 19:25
@jeromekelleher
Copy link
Member

Very neat! Looks very nice, but I think we want to be sure that this is the "right" graph representation before we make it part of the public API. Some questions:

  1. What do these graphs look like when we plot them? I.e., what does a single tree ts look like, and one with two trees?
  2. What would be need to do to round trip the topology? Ideally we would want
d = ts.as_dict_of_dicts()
tsp = tskit.TreeSequence.from_dict_of_dicts(d)
assert tsp.equals(ts, ignore_provenance=True)

We would need to have some visibilty on whether this is possible before we can fix on the graph representation.

@benjeffery
Copy link
Member

Nice! In the round trip example @jeromekelleher, do you mean that just the topology would round trip? Or other things like populations and migrations? (I guess these could be included as an attribute at the root dict?)

@jeromekelleher
Copy link
Member

Just the topology initially, we don't have to worry about the other stuff like individuals, migrations etc. But yes, in the case of populations it'd probably be simplest to add them in as attrs in the root dict.

@hyanwong
Copy link
Member Author

hyanwong commented Apr 6, 2021

AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths.

@benjeffery
Copy link
Member

AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths.

We would round-trip the dicts, not the NetworkX representation.

@jeromekelleher
Copy link
Member

Yes - I'm afraid we have to do the due diligence here @hyanwong. We need to spend time exploring the properties of this representation if we're doing to make it part of the API. The first question we need to answer is "is this a bijection"?

@hyanwong
Copy link
Member Author

hyanwong commented Apr 6, 2021

I assumed this would simply be an analogue to Tree.as_dict_of_dicts, so purely a (non complete) export format for NetworkX. If you want it to be something more, then it gets complex, I agree.

@hyanwong
Copy link
Member Author

hyanwong commented Apr 6, 2021

We don't have a Tree.from_dod method, so I'm not sure we need (want?) a TS one?

@hyanwong
Copy link
Member Author

hyanwong commented Apr 6, 2021

Apologies for the terse replies: I'm on a mobile. There's no reason to merge this any time soon. As it is, I need to use a slightly bespoke version anyway, which randomises the node IDs (otherwise locating isomorphic nodes is biased towards the input order). I just imagine that if I find a function like this useful for analysing the tree sequence as a graph, others might too. So it's probably worth having a similar function in the API. But as you say, worth doing the due diligence first. I'm in no hurry either way.

@jeromekelleher
Copy link
Member

OK - since we're making quite a big decision here (what's the graph representation of a tree sequence) @hyanwong, I think it would be better if you used the version you have for your own work, and we took our time with this. The Tree version is much simpler and we don't worry about round tripping it because there's no way of creating a Tree on it's own anyway. The TreeSequence is the core data structure.

So, I think we can mark this as a draft?

@hyanwong hyanwong marked this pull request as draft April 6, 2021 13:57
@hyanwong
Copy link
Member Author

hyanwong commented Apr 6, 2021

So, I think we can mark this as a draft?

Done.

FWIW I don't think as_dict_of_dict should be the "canonical" graph representation of a TS. I think this only needs to be a function used for reading a specific format into NetworkX. We shouldn't claim anything grander about it. If we want a canonical format we could just use .__dict__() (or something) instead.

@hyanwong hyanwong mentioned this pull request Apr 25, 2021
3 tasks
@jeromekelleher
Copy link
Member

jeromekelleher commented Dec 16, 2021

Could we update this method to return the representation I'm advocating for here @hyanwong? #2068 (reply in thread)

I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step.

@hyanwong
Copy link
Member Author

Could we update this method to return the representation I'm advocating for here @hyanwong? #2068 (reply in thread)

I haven't figured out how to use the "dict_of_dicts" version to allocate attributes (like time) to nodes. So maybe this isn't the right way anyway?

I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step.

Hmm, is that true. Isn't it just as easy to go from multigraph -> graph as the other way round?

@benjeffery
Copy link
Member

Tidying up old PRs - please re-open if you wish.

@benjeffery benjeffery closed this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TreeSequence.as_dict_of_dicts() method
3 participants