-
Notifications
You must be signed in to change notification settings - Fork 35
Add count_call_alternate_alleles
function
#282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@eric-czech I think this could be calculated directly from genotype-calls similar to scikit-allel (source) this should also work with #243. Following the scikit-allel code an implementation could look something like: alternate_counts = (ds['genotype_calls'] > 0).sum(axis=-1)
partial = patial_genotype_calls(ds['genotype_calls'])
alternate_counts = xr.where(partial, fill, alternate_counts) The function |
So I'm following, you're saying I would have no need for it, but would you ever use a switch that allows for partial calls to be included in the alternate counts? In other words, it would only return the fill value if all calls were missing rather than any one of them. |
Yes exactly.
I don't actually have a use case for either at the moment, just suggesting an implementation. The scikit-allel version does actually count alleles in partial genotypes if >>> import allel
>>> g = allel.GenotypeArray([
... [[1,1],[1,1]],
... [[1,1],[1,-1]]
... ])
>>> g.to_n_alt(fill=0)
array([[2, 2],
[2, 1]], dtype=int8) vs >>> g.to_n_alt(fill=-1)
array([[ 2, 2],
[ 2, -1]], dtype=int8) Another option is to always count the alleles of partial genotypes (only returning the fill value if all calls are missing) and leave it up to the user to filter out the partial genotypes as in #223. This may be preferable to handling partial genotypes in multiple functions. |
Uh oh!
There was an error while loading. Please reload this page.
I'm not so sure about that name but it seems like the most obvious choice given how our other counting functions are named.
This should do what https://scikit-allel.readthedocs.io/en/stable/model/ndarray.html#allel.Genotypes.to_n_alt does.
Likely tasks:
count_call_alleles
has been runds.call_allele_count[:, 1:].sum(dim='alleles')
)call_allele_count
to count missing alleles too. That could get tricky with https://github.com/pystatgen/sgkit/issues/243 though, so it is probably even better if the function relies on the missingness mask instead (= slightly less efficient but more readable code).allel.Genotypes.to_n_alt(fill=-1)
, meaning that partially or completely missed calls result in a -1 countThe text was updated successfully, but these errors were encountered: