Remove `samples` and `keep_size` from posterior_predictive #5775

ricardoV94 · 2022-05-17T13:05:35Z

A bit more work than what was needed for #5772, mostly because we use samples in several tests.

imo kill everything, size, keep_size and samples (to get less samples one can always provide a thinned posterior as shown in the example)

The text was updated successfully, but these errors were encountered:

covertg · 2022-06-08T15:31:22Z

Hi, I'd be interested in working on this. I'm new to contributing to pymc so would appreciate any pointers on what to target for this issue.

Seems like we'll want to update code and docstrings in pymc/sampling.py, as well as any tests in pymc/tests/ that call on sample_posterior_predictive with these args. Anything else?

OriolAbril · 2022-06-08T17:12:08Z

That sounds right. I think the examples in the docs (for example but probably not restricted to: https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/posterior_predictive.html) are already not using either argument, but it'd be best to check too.

covertg · 2022-06-10T14:17:28Z

Thanks!
~~Should we also remove the samples argument from sample_posterior_predictive_w()?~~ Edit: never mind, I see it's a work in progress for v4 (#5800)

covertg · 2022-06-10T15:59:15Z

Another question: if the user passes return_inferencedata=False, we can either reshape the array elements of the return value (which is type Dict[str, ndarray]) to be shape (chain, draw, ...), or just leave them as-is (flattened arrays). Which should we do?

The default behavior currently is to reshape (since keep_size defaults to True) but just wanted to make sure, given that we're nixing keep_size. The tests currently make use of both, but it's much more weighted towards reshaping.

OriolAbril · 2022-06-10T19:09:27Z

I think we can also remove this and always return InferenceData. Unlike with pm.sample where there is some info that is still present only in the old trace that is not the case here. What do you think? cc @ricardoV94 @michaelosthege

ricardoV94 · 2022-06-10T19:27:58Z

I would prefer to keep having the trace option

OriolAbril · 2022-06-10T19:34:57Z

Here the options are dict and InferenceData, trace is only on pm.sample (which I agree can't be removed yet)

ricardoV94 · 2022-06-10T20:16:18Z

The default behavior currently is to reshape (since keep_size defaults to True) but just wanted to make sure, given that we're nixing keep_size. The tests currently make use of both, but it's much more weighted towards reshaping.

Reshaping sounds alright.

ricardoV94 · 2022-06-10T20:17:21Z

Here the options are dict and InferenceData, trace is only on pm.sample (which I agree can't be removed yet.

Is a dict also what we get from prior predictive with return_inferencedata = False?

OriolAbril · 2022-06-10T20:32:54Z

yep, we could get rid of the dict output in prior sampling I think

ricardoV94 · 2022-06-10T20:38:25Z

Unless there is a large maintenance burden I rather keep it, I was just asking to understand if there was something special about posterior predictive.

I understand a lot of people prefer InferenceData, but I find it a PITA to be honest.

michaelosthege · 2022-06-11T01:08:27Z

I see the benefits of a trace object with a simpler API, but the current implementation of BaseTrace/MultiTrace has some terrible incoherences.

We'll always have a non-InferenceData data structure to manage draws at MCMC-time.
If that thing has a coherent API, I think we can offer to return it to the user.

ricardoV94 · 2022-06-11T06:46:53Z

@michaelosthege this is about the dict alternative from prior/posterior_predictive. I also thought it was a Multitrace object, but it's even simpler than that.

I don't see the problem woth a simple dict :) Also, I can come up with cases where the idata would fail, such as dynamic sized variables, although those are overall not supported in PyMC

covertg · 2022-06-12T11:12:38Z

Happy to incorporate the decision either way in the PR. I can get back to it early this coming week.

A bit of feedback as a user; the fact that the return_inferencedata argument behaves differently in different methods (ie False to the mcmc sampler returns multitrace, False to the predictive samplers returns dict) was a bit confusing at first. It seems like really they're different things, or at least the behavior of "don't return inferencedata" if kept could be made more clear at the argument level.

covertg · 2022-06-12T11:19:10Z

Also, not sure how important this is, but it seems that nearly all of the tests that do posterior predictive sampling currently use return_inferencedata=False. Is this fine left as-is or should any be updated to use idata?

michaelosthege · 2022-06-12T12:09:27Z

Since return_inferencedata=True is now the default, I agree that the naming could be improved.
Something like:

pm.sample(return_mtrace=False)
pm.sample_p*_predictive(return_dict=False)

w.r.t. to the original description of the issue, I'm also in favor of removing samples, keep_size arguments from sample_posterior_predictive.
Then thinned sampling will only be supported with xarray inputs, as shown in the docstring:

pymc/pymc/sampling.py

Lines 1812 to 1819 in 3f2afb2

    
               Thin a sampled inferencedata by keeping 1 out of every 5 draws 
        
               before passing it to sample_posterior_predictive 
        
               .. code:: python 
        
                   thinned_idata = idata.sel(draw=slice(None, None, 5)) 
        
                   with model: 
        
                       idata.extend(pymc.sample_posterior_predictive(thinned_idata))

Most tests use return_inferencedata=False simply because they were not migrated.
But skipping the InferenceData conversion is also slightly faster, so I think we should stick to return_inferencedata=False/return_dict=True in the test suite.

Likewise many tests use the pm.sample(return_inferencedata=False) option because working with the MultiTrace representation is less verbose and skips the conversion step.

ricardoV94 added beginner friendly maintenance labels May 17, 2022

OriolAbril mentioned this issue Jul 15, 2022

Make docstring for keep_size in Posterior predictive check be nicer, or turn it off by default #5971

Closed

OriolAbril added this to the v4.2.0 milestone Aug 2, 2022

pibieta mentioned this issue Aug 5, 2022

Remove samples and keep_size from sample_posterior_predictive #6029

Merged

ricardoV94 modified the milestones: v4.2.0, v4.3.0 Sep 14, 2022

twiecki closed this as completed Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `samples` and `keep_size` from posterior_predictive #5775

Remove `samples` and `keep_size` from posterior_predictive #5775

ricardoV94 commented May 17, 2022 •

edited

Loading

covertg commented Jun 8, 2022

OriolAbril commented Jun 8, 2022

covertg commented Jun 10, 2022 •

edited

Loading

covertg commented Jun 10, 2022 •

edited

Loading

OriolAbril commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022

OriolAbril commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022

OriolAbril commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022 •

edited

Loading

michaelosthege commented Jun 11, 2022

ricardoV94 commented Jun 11, 2022 •

edited

Loading

covertg commented Jun 12, 2022

covertg commented Jun 12, 2022

michaelosthege commented Jun 12, 2022

Remove samples and keep_size from posterior_predictive #5775

Remove samples and keep_size from posterior_predictive #5775

Comments

ricardoV94 commented May 17, 2022 • edited Loading

covertg commented Jun 8, 2022

OriolAbril commented Jun 8, 2022

covertg commented Jun 10, 2022 • edited Loading

covertg commented Jun 10, 2022 • edited Loading

OriolAbril commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022

OriolAbril commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022

OriolAbril commented Jun 10, 2022

ricardoV94 commented Jun 10, 2022 • edited Loading

michaelosthege commented Jun 11, 2022

ricardoV94 commented Jun 11, 2022 • edited Loading

covertg commented Jun 12, 2022

covertg commented Jun 12, 2022

michaelosthege commented Jun 12, 2022

Remove `samples` and `keep_size` from posterior_predictive #5775

Remove `samples` and `keep_size` from posterior_predictive #5775

ricardoV94 commented May 17, 2022 •

edited

Loading

covertg commented Jun 10, 2022 •

edited

Loading

covertg commented Jun 10, 2022 •

edited

Loading

ricardoV94 commented Jun 10, 2022 •

edited

Loading

ricardoV94 commented Jun 11, 2022 •

edited

Loading