-
How do I import a VCF file into tsinfer? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
Replying to my own question, there's code provided in the tutorial for reading VCF files, including reading in some of the metadata in the VCF (details further down that page). If you have a very large VCF file and it is taking a long time, then it is possible to read chunks of your VCF in parallel: see #277 (comment) for example code. Note that in the longer term, we may be moving tsinfer to using sgkit as an import framework: when this happens this answer may become obsolete. Also note that you if you are importing VCFs into tsinfer, you may wish to investigate how to infer ancestral states properly. |
Beta Was this translation helpful? Give feedback.
-
Hi Yan, Have there been any updates to this? I ran a script with the exact code given in that tutorial link (except with file path changed to my VCF) on a VCF generated by SLiM (where I manually input a sequence length in header) and receive this error: /mnt/apps/users/nbailey/conda/lib/python3.12/site-packages/tsinfer/formats.py:458: FutureWarning: The LMDBStore is deprecated and will be removed in a Zarr-Python version 3, see zarr-developers/zarr-python#1274 for more information. I can provide the exact script and VCF if needed but again both shouldn't have any substantive deviation from what's assumed in tutorial. Cheers, |
Beta Was this translation helpful? Give feedback.
Replying to my own question, there's code provided in the tutorial for reading VCF files, including reading in some of the metadata in the VCF (details further down that page). If you have a very large VCF file and it is taking a long time, then it is possible to read chunks of your VCF in parallel: see #277 (comment) for example code.
Note that in the longer term, we may be moving tsinfer to using sgkit as an import framework: when this happens this answer may become obsolete.
Also note that you if you are importing VCFs into tsinfer, you may wish to investigate how to infer ancestral states properly.