tskit's genetic_relatedness()
versus eGRM (Fan et al. 2022)
#2603
Unanswered
grahamgower
asked this question in
Q&A
Replies: 2 comments 12 replies
-
Let's see - without looking up the details, I think that
|
Beta Was this translation helpful? Give feedback.
6 replies
-
@grahamgower - I've put together a version that does things in a naive but fast way in this gist. It definitely uses dramatically less memory and should be faster than the current tskit approach. Curious to see how it stacks up in your example - any chance you could try it out please? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello tskitters,
I've inferred tree sequences using tsinfer/tsdate for a chicken dataset of 674 individuals. I also have trees from relate (but output below is for tsinfer/tsdate trees). I've chosen the chromosome with the smallest trees file (chr16), and calculated the genetic relatedness matrix (GRM) using tskit's
genetic_relatedness()
and Fan et al.'segrm
package.What is the difference between the GRM obtained from tskit (using mode="branch", script reproduced below) and egrm? If someone on the street asked me, I'd say they should be doing essentially the same thing, but I don't grok the stats framework and/or genetic_relatedness docs (it's too general and/or abstract for me).
Why is there such a huge discrepancy in resources used by tskit compared with eGRM? tskit used 170 minutes, and egrm was under 8 minutes. I didn't think to record the memory usage, but saw in
top
that tskit was hitting 17 Gb (resident), while egrm didn't seem to go much beyond 120 mb.Beta Was this translation helpful? Give feedback.
All reactions