Skip to content

Count extant lineages at a given time #386

@jeromekelleher

Description

@jeromekelleher

An operation that came up as being a useful feature in the recent PopSim meeting is to be able to count the number of extant lineages in each tree at a given time. This is equivalent to (a) deleting all edges older than t and truncating all edges intersecting with it (this would require the planned time argument to delete_interval in #382); and (b) counting the number of roots in each tree. We could define the operation by just doing this, but it seems a bit inefficient, especially if we do this for lots of different times.

Here's a way we can do it more efficiently:

def count_roots(ts, t):
    time = ts.tables.nodes.time    
    num_children = np.zeros(ts.num_nodes, dtype=int)
    num_roots = ts.num_samples
    ret = np.zeros(ts.num_trees, dtype=int)

    j = 0
    for _, edges_out, edges_in in ts.edge_diffs():
        for edge in edges_out:
            if time[edge.parent] <= t:
                num_children[edge.parent] -= 1
                if num_children[edge.parent] == 1:
                    num_roots += 1
        for edge in edges_in:
            if time[edge.parent] <= t:                
                num_children[edge.parent] += 1
                if num_children[edge.parent] == 2:
                    num_roots -= 1
        ret[j] = num_roots
        j += 1
    return ret

We can code this up pretty efficiently in C, and return the numpy array. Should we do it this way? Should we call the method TreeSequence.num_roots(time=None), which returns a numpy array like the above? (If time is None, it defaults to the oldest time in the ts)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Python APIIssue is about the Python APIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions