Skip to content

Sample Posterior Predictive Behavior #3208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ColCarroll opened this issue Sep 23, 2018 · 7 comments
Closed

Sample Posterior Predictive Behavior #3208

ColCarroll opened this issue Sep 23, 2018 · 7 comments

Comments

@ColCarroll
Copy link
Member

ColCarroll commented Sep 23, 2018

This is a proposal to update sample_posterior_predictive behavior.

Consider:

with pm.Model():
    x = pm.Normal('x')
    y = pm.Normal('y', mu=x, observed=1)
    posterior = pm.sample(draws=500, chains=4)
    posterior_predictive = pm.sample_posterior_predictive(posterior)

print(posterior_predictive['y'].shape)

I propose this should print (2000,).

Current behavior is (500,), and the implementation is roughly:

indices = np.random.randint(0, len(trace), samples)
for idx in indices:
    point = trace[idx]
    ppc_samples = draw_values([var], point=point)
return np.array(ppc_samples)

Good things:

  • Supports any number of samples
  • They are asymptotically coming from the posterior predictive distribution
  • Samples are perhaps less correlated

Bad things:

  • Ignores all but the first chain
  • Ignores some of the first chain's posterior samples too
  • Can not match posterior samples with ppc draws for diagnostics (this came up in arviz)

This would be a reasonably large breaking change! In a perfect world, I would want the return value to always be (4, 500), but if we make it so .reshape(4, 500) matches up with the shape of the full trace. My suggestion is:

  • Change the default to what I have described,
  • Add a shuffle argument
  • If samples is bigger than 2000 and shuffle is False, cycle back over the posterior (in order).
@junpenglao
Copy link
Member

junpenglao commented Sep 23, 2018

Ignores all but the first chain

That's not quite the case - it randomly samples from all 2000 posterior samples and generates 500 posterior predictive samples.

https://github.com/pymc-devs/pymc3/blob/14b3fd01340c269dcc086e5b12e1bc503502c16e/pymc3/sampling.py#L1126-L1131

@ColCarroll
Copy link
Member Author

You're right, thanks - I read line 1131 from there, instead of 1129. I'll update the issue.

@junpenglao
Copy link
Member

junpenglao commented Sep 23, 2018

Cool - also

Add a shuffle argument

is probably not needed.

I am curious about the context of the purpose change: is it for it to easier to work with using arviz? i guess Stan return the posterior predictive samples the same (chains, draws, ...) shape as the MCMC samples.

@twiecki
Copy link
Member

twiecki commented Sep 23, 2018

So really the proposal is to sample the number of total samples, which makes sense to me.

@ColCarroll
Copy link
Member Author

junpenglao - Yes, the impetus for this is in ArviZ, the dimensions allow you to select by chain and draw, so having that consistent would be nice. I could imagine ppc joint scatter plots being interesting (or maybe not interesting at all?) The scatter plots would require that these dimensions be aligned.

twiecki - This would also stop shuffling the samples (which seems fine to me)

This seems like a positive enough response that I'll make a PR!

@OriolAbril
Copy link
Member

@ColCarroll @junpenglao @twiecki I had some trouble trying to understand how posterior predictive sampling worked until I looked at the source code. If it is ok with you I will submit a PR to expand the docs a little and I was also thinking in logging some warning when samples != nchains*ndraws. Or at least when samples<<nchains*ndraws, as whole chains are not taken into account.

I am not really sure about the use of size parameter (and even though ArviZ has no support for it, there are no issues related yet). Can I discourage its use in the docs?

@twiecki
Copy link
Member

twiecki commented Jun 11, 2019

@OriolAbril That would be much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants