Standardize how intermediate variables are handled #286

eric-czech · 2020-09-29T16:46:06Z

We currently have several functions with logic like this (from here):

def Tajimas_D(
    ds: Dataset,
    allele_counts: Hashable = "variant_allele_count",
) -> DataArray:
    if allele_counts not in ds:
        ds_new = count_variant_alleles(ds)
    else:
        ds_new = ds
    ac = ds_new[allele_counts]

What should we do though if a non-default variable name is provided and that variable doesn't exist in the original dataset? In this case above the code would fail on ds_new[allele_counts]. There is also the question of whether or not intermediate variables like this should be in the result or not, which I believe @tomwhite has mentioned before. We have it both ways in the code. Lastly, if count_variant_alleles had any optional behavior (e.g. flags for handling partial calls differently), it wouldn't necessarily make sense for us to use the default behavior transparently like this.

At this point, I'm inclined to say we should remove default calculations altogether and require that the variables are present in the first place. What do you think of that @tomwhite / @jeromekelleher / @ravwojdyla?

The text was updated successfully, but these errors were encountered:

jeromekelleher · 2020-09-30T07:42:22Z

In terms of library semantics, I agree it would be considerably simpler if left out the default calculations like this. I'm starting to view the current API as the "expert's" low-level interface, which hopefully we can build something more high-level on top of (at some point), so I think it would be better if we left out user-convenience stuff like this and focused on keeping things simple and efficient.

Anyway, yes, I agree, let's ditch the automatic intermediate value calculation.s

tomwhite · 2020-10-01T10:06:48Z

+1 to removing default calculations.

Fixes sgkit-dev#286

tomwhite · 2020-10-22T08:48:19Z

Taken to its logical conclusion, this would mean that to calculate Fst you'd have to call divergence first. And for pbs, you'd have to call divergence and Fst first.

jeromekelleher · 2020-10-22T12:15:57Z

That does seem annoying, and would lead to ugly user code. What's the alternative? -Should we start thinking about the "sgkit-lite" easy interface now?

eric-czech · 2020-10-22T18:03:16Z

From the call, there was agreement on trying this instead:

Standardize on having all intermediate variables added to the dataset (some are used temporarily now).
Make it clear in the user guide that non-default names for variables will not result in automatic definition of those variables when they aren't present.
- Arguably, we should try to detect and throw errors specifically for this case
Add some examples to show users how to override the default definition of a variable prior to calling another method that depends on it, if he/she wishes to define it in a non-default way.

tomwhite · 2021-06-07T16:56:43Z

Fixed in #360 (and previous issues for the implementation)

eric-czech mentioned this issue Sep 29, 2020

PCA implementation #262

Merged

18 tasks

hammer added this to the 0.1.0 milestone Oct 15, 2020

tomwhite added a commit to tomwhite/sgkit that referenced this issue Oct 21, 2020

Eliminate automatically computed intermediate variables.

2ca2c87

Fixes sgkit-dev#286

tomwhite mentioned this issue Oct 21, 2020

Eliminate automatically computed intermediate variables (for popgen) #342

Closed

eric-czech changed the title ~~Eliminate automatically computed intermediate variables?~~ Standardize how intermediate variables are handled Oct 22, 2020

This was referenced Oct 27, 2020

Standardize intermediate variables #352

Merged

Document behaviour of intermediate variables #360

Closed

tomwhite closed this as completed Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Standardize how intermediate variables are handled #286

Standardize how intermediate variables are handled #286

eric-czech commented Sep 29, 2020

jeromekelleher commented Sep 30, 2020

Uh oh!

tomwhite commented Oct 1, 2020

Uh oh!

tomwhite commented Oct 22, 2020

Uh oh!

jeromekelleher commented Oct 22, 2020

Uh oh!

eric-czech commented Oct 22, 2020 •

edited

Loading

Uh oh!

tomwhite commented Jun 7, 2021

Uh oh!

Standardize how intermediate variables are handled #286

Standardize how intermediate variables are handled #286

Comments

eric-czech commented Sep 29, 2020

jeromekelleher commented Sep 30, 2020

Uh oh!

tomwhite commented Oct 1, 2020

Uh oh!

tomwhite commented Oct 22, 2020

Uh oh!

jeromekelleher commented Oct 22, 2020

Uh oh!

eric-czech commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomwhite commented Jun 7, 2021

Uh oh!

eric-czech commented Oct 22, 2020 •

edited

Loading