-
Notifications
You must be signed in to change notification settings - Fork 77
allele frequency function #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Nice, good idea. |
It may be nice to optionally return the joint fs as a sparse matrix. |
I'm not sure what you mean here? |
I assume your return value is the marginal fs in each sample set, but sometimes you want the joint fs to instead, such that fs[2,3] is the number of mutations present twice in sample set 0 and three times in set 1. Or maybe I'm not understanding what you are returning here. |
That sounds like the AFS? Which we already have? Here I'm wanting to return the list of allele frequencies for each SNP. |
Gotcha--my mistake. |
Triggered my AFS implementation PTSD there - would not want to do that again! |
I shouldn't comment when I haven't been sleeping. 😵
…On Tue, Mar 31, 2020, 12:50 AM Jerome Kelleher ***@***.***> wrote:
That sounds like the AFS? Which we already have? Here I'm wanting to
return the list of allele frequencies for each SNP.
Triggered my AFS implementation PTSD there - would *not* want to do that
again!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#504 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQ6OHZCOYZBPTG4UORY6TDRKGOCJANCNFSM4LWYYVBA>
.
|
This computes the total frequency of all mutations that are segregating in the samples, by site. People will also want to know the frequencies of mutations, separately: we should have another method that returns something of length equal to the number of mutations. I don't think we can compute this easily with the general stat framework, but here is a method to pull out this information for the mutations at a particular site:
|
Another related thing: it'd be nice to have a method that returns a numpy array giving the number of alleles at each site, as discussed in this discussion. |
Here's a function I just wrote which I think would be useful to have in the library in some form: def count_site_alleles(ts, tree, site):
counts = collections.Counter({site.ancestral_state: ts.num_samples})
for m in site.mutations:
current_state = site.ancestral_state
if m.parent != tskit.NULL:
current_state = ts.mutation(m.parent).derived_state
# Silent mutations do nothing
if current_state != m.derived_state:
num_samples = tree.num_samples(m.node)
counts[m.derived_state] += num_samples
counts[current_state] -= num_samples
return counts
def count_ancestral(ts):
num_ancestral = np.zeros(ts.num_sites, dtype=int)
for tree in ts.trees():
for site in tree.sites():
counts = count_site_alleles(ts, tree, site)
num_ancestral[site.id] = counts[site.ancestral_state]
return num_ancestral |
So, should we add a method def count_ancestral(ts):
num_ancestral = np.zeros(ts.num_sites, dtype=int)
for tree in ts.trees():
for site in tree.sites():
counts = tree.count_alleles(site)
num_ancestral[site.id] = counts[site.ancestral_state]
return num_ancestral We might want a more general method that can do this within specified sample sets, but this couldn't be a method of the Tree like this, so I think this simpler version is a useful building block to have anyway. Site could either be an instance of |
I like that idea! |
Made a new issue in #1610. I guess we should keep this issue open in case there's higher-level counting operations we want to do? |
@petrelharp your original |
doh! I will edit. |
Uh oh!
There was an error while loading. Please reload this page.
We should implement the
TreeSequence.allele_frequencies(sample_sets)
function, which returns a numpy array of (non-ancestral allele frequencies) x (sample_sets).Here's an implementation:
Edit: originally this omitted
span_normalise=False
.The text was updated successfully, but these errors were encountered: