Add dict_of_dicts to tree sequence #1296

hyanwong · 2021-04-02T10:25:56Z

Description

Fixes #1294. Should probably add some tests for attributes of the graph like branch_length, left, and right.

PR Checklist:

Tests that fully cover new/changed functionality.
Documentation including tutorial content if appropriate.
Changelogs, if there are API changes.

codecov · 2021-04-02T10:46:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.81%. Comparing base (f277006) to head (935d052).
Report is 1140 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1296   +/-   ##
=======================================
  Coverage   93.81%   93.81%           
=======================================
  Files          26       26           
  Lines       22187    22198   +11     
  Branches     1006     1009    +3     
=======================================
+ Hits        20814    20825   +11     
  Misses       1340     1340           
  Partials       33       33

Flag	Coverage Δ
c-tests	`92.44% <ø> (ø)`
lwt-tests	`92.97% <ø> (ø)`
python-c-tests	`95.15% <100.00%> (+<0.01%)`	⬆️
python-tests	`98.86% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
python/tskit/trees.py	`97.91% <100.00%> (+0.01%)`	⬆️

Fixes tskit-dev#1294

jeromekelleher · 2021-04-06T08:32:51Z

Very neat! Looks very nice, but I think we want to be sure that this is the "right" graph representation before we make it part of the public API. Some questions:

What do these graphs look like when we plot them? I.e., what does a single tree ts look like, and one with two trees?
What would be need to do to round trip the topology? Ideally we would want

d = ts.as_dict_of_dicts()
tsp = tskit.TreeSequence.from_dict_of_dicts(d)
assert tsp.equals(ts, ignore_provenance=True)

We would need to have some visibilty on whether this is possible before we can fix on the graph representation.

benjeffery · 2021-04-06T10:56:24Z

Nice! In the round trip example @jeromekelleher, do you mean that just the topology would round trip? Or other things like populations and migrations? (I guess these could be included as an attribute at the root dict?)

jeromekelleher · 2021-04-06T11:09:17Z

Just the topology initially, we don't have to worry about the other stuff like individuals, migrations etc. But yes, in the case of populations it'd probably be simplest to add them in as attrs in the root dict.

hyanwong · 2021-04-06T11:20:51Z

AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths.

benjeffery · 2021-04-06T11:22:11Z

AFAIK when NetworkX imports from dict of dicts you can only set attributes on edges: I don't know how to set them on nodes. So round tripping might be difficult. Even setting node times would have to be imperfectly guessed from edge lengths.

We would round-trip the dicts, not the NetworkX representation.

jeromekelleher · 2021-04-06T11:28:43Z

Yes - I'm afraid we have to do the due diligence here @hyanwong. We need to spend time exploring the properties of this representation if we're doing to make it part of the API. The first question we need to answer is "is this a bijection"?

hyanwong · 2021-04-06T12:53:24Z

I assumed this would simply be an analogue to Tree.as_dict_of_dicts, so purely a (non complete) export format for NetworkX. If you want it to be something more, then it gets complex, I agree.

hyanwong · 2021-04-06T12:55:14Z

We don't have a Tree.from_dod method, so I'm not sure we need (want?) a TS one?

hyanwong · 2021-04-06T13:40:24Z

Apologies for the terse replies: I'm on a mobile. There's no reason to merge this any time soon. As it is, I need to use a slightly bespoke version anyway, which randomises the node IDs (otherwise locating isomorphic nodes is biased towards the input order). I just imagine that if I find a function like this useful for analysing the tree sequence as a graph, others might too. So it's probably worth having a similar function in the API. But as you say, worth doing the due diligence first. I'm in no hurry either way.

jeromekelleher · 2021-04-06T13:56:14Z

OK - since we're making quite a big decision here (what's the graph representation of a tree sequence) @hyanwong, I think it would be better if you used the version you have for your own work, and we took our time with this. The Tree version is much simpler and we don't worry about round tripping it because there's no way of creating a Tree on it's own anyway. The TreeSequence is the core data structure.

So, I think we can mark this as a draft?

hyanwong · 2021-04-06T14:00:22Z

So, I think we can mark this as a draft?

Done.

FWIW I don't think as_dict_of_dict should be the "canonical" graph representation of a TS. I think this only needs to be a function used for reading a specific format into NetworkX. We shouldn't claim anything grander about it. If we want a canonical format we could just use .__dict__() (or something) instead.

jeromekelleher · 2021-12-16T12:50:50Z

Could we update this method to return the representation I'm advocating for here @hyanwong? #2068 (reply in thread)

I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step.

hyanwong · 2021-12-16T13:14:57Z

Could we update this method to return the representation I'm advocating for here @hyanwong? #2068 (reply in thread)

I haven't figured out how to use the "dict_of_dicts" version to allocate attributes (like time) to nodes. So maybe this isn't the right way anyway?

I think it's better to annotate the nodes with their time, and the edges with intervals, rather than making a multigraph with branch lengths like we have here. Making the multigraph version for viz should be simple enough as a post-processing step.

Hmm, is that true. Isn't it just as easy to go from multigraph -> graph as the other way round?

benjeffery · 2024-09-23T12:42:27Z

Tidying up old PRs - please re-open if you wish.

hyanwong marked this pull request as draft April 2, 2021 10:26

hyanwong force-pushed the ts-dict-of-dicts branch 3 times, most recently from fc393a4 to 9cf952c Compare April 2, 2021 18:20

Add dict_of_dicts to tree sequence

935d052

Fixes tskit-dev#1294

hyanwong force-pushed the ts-dict-of-dicts branch from 9cf952c to 935d052 Compare April 2, 2021 19:12

hyanwong marked this pull request as ready for review April 2, 2021 19:25

hyanwong mentioned this pull request Apr 2, 2021

Tutorial: explain how to represent ARGs tskit-dev/tutorials#43

Closed

hyanwong marked this pull request as draft April 6, 2021 13:57

hyanwong mentioned this pull request Apr 25, 2021

Add RankTree.leaves() #1292

Closed

3 tasks

benjeffery closed this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add dict_of_dicts to tree sequence #1296

Add dict_of_dicts to tree sequence #1296

Uh oh!

hyanwong commented Apr 2, 2021 •

edited

Loading

Uh oh!

codecov bot commented Apr 2, 2021 •

edited

Loading

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

benjeffery commented Apr 6, 2021

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

benjeffery commented Apr 6, 2021

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

jeromekelleher commented Dec 16, 2021 •

edited

Loading

Uh oh!

hyanwong commented Dec 16, 2021

Uh oh!

benjeffery commented Sep 23, 2024

Uh oh!

Uh oh!

Add dict_of_dicts to tree sequence #1296

Add dict_of_dicts to tree sequence #1296

Uh oh!

Conversation

hyanwong commented Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR Checklist:

Uh oh!

codecov bot commented Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

benjeffery commented Apr 6, 2021

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

benjeffery commented Apr 6, 2021

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

jeromekelleher commented Apr 6, 2021

Uh oh!

hyanwong commented Apr 6, 2021

Uh oh!

jeromekelleher commented Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyanwong commented Dec 16, 2021

Uh oh!

benjeffery commented Sep 23, 2024

Uh oh!

Uh oh!

hyanwong commented Apr 2, 2021 •

edited

Loading

codecov bot commented Apr 2, 2021 •

edited

Loading

jeromekelleher commented Dec 16, 2021 •

edited

Loading