Skip to content

Consider standardizing the API of pca and pc_relate #1077

Open
@timothymillar

Description

@timothymillar

These were implemented early on, before the schema/variables where standardized. I'm not sure if there's any appetite to update them, but here are a few observations:

  • Both use custom methods to calculate an equivalent to call_dosage.

    • pca creates an undocumented variable called "call_alternate_allele_count" from call_allele_count (line)
    • pc_relate internally creates alternate allele counts directly from the call_genotype (line)
    • They would likely be more flexible if defined in terms of the call_dosage variable
    • call_dosage could be automatically computed from call_genotype for backwards compatibility, see Add count_call_alternate_alleles function #282
  • pc_relate does its own filtering by MAF. This doesn't seem to be the approach taken elsewhere, but I'm not sure that we have a "standard" approach to filtering?

I was looking at the implementation after considering having "pc-relate" as an estimator option in genomic_relationship. AFAICT this should work so long as a sample_pc parameter was added (similar to ancestral_frequency for the VanRaden estimator). This would return relationships (as opposed to pc_relate_phi which is an estimate of kinship).

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions