write_vcf(): discrete_genome, 1-based coordinates, and contig length #1993
Unanswered
grahamgower
asked this question in
Q&A
Replies: 1 comment 2 replies
-
This is hairy stuff @grahamgower! I don't think we've thought deeply about 1-based coordinates, principally going under the assumption that if you're doing simulations it doesn't matter and if you're working with real data you've input the original coordinates, as is. A couple of notes:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi tskitters,
My goal is to output a vcf using 1-based inclusive coords from a simulation with mutations at integral positions. By default,
write_vcf()
will output 0-based coords, and it is possible that a variant will be given POS 0. Thewrite_vcf()
docs suggest that if this behaviour is undesirable, the onus is on the user to transform coordinates by passing aposition_transform
function. (As an aside, I see the VCF4.2 spec says: "telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig".)For an infinite sites simulation, I guess it makes sense to use
position_transform=np.ceil
to get 1-based inclusive coords. But this an identity transformation for finite sites simulations---a mutation at position 0 is still at position 0 after transformation. So I figured I'd write it asposition_transform=lambda x: 1 + np.floor(x)
. This gives the right coordinates (I think), but then I saw that the contig length is also being transformed by theposition_transform()
function! Is there a recommended incantation to get 1-based inclusive coords and not mung the contig length?Output:
Beta Was this translation helpful? Give feedback.
All reactions