Open
Description
These were implemented early on, before the schema/variables where standardized. I'm not sure if there's any appetite to update them, but here are a few observations:
-
Both use custom methods to calculate an equivalent to
call_dosage
.pca
creates an undocumented variable called"call_alternate_allele_count"
fromcall_allele_count
(line)- pc_relate internally creates alternate allele counts directly from the
call_genotype
(line) - They would likely be more flexible if defined in terms of the
call_dosage
variable call_dosage
could be automatically computed fromcall_genotype
for backwards compatibility, see Addcount_call_alternate_alleles
function #282
-
pc_relate
does its own filtering by MAF. This doesn't seem to be the approach taken elsewhere, but I'm not sure that we have a "standard" approach to filtering?
I was looking at the implementation after considering having "pc-relate"
as an estimator option in genomic_relationship
. AFAICT this should work so long as a sample_pc
parameter was added (similar to ancestral_frequency
for the VanRaden estimator). This would return relationships (as opposed to pc_relate_phi
which is an estimate of kinship).