Closed
Description
This is a proposal to update sample_posterior_predictive
behavior.
Consider:
with pm.Model():
x = pm.Normal('x')
y = pm.Normal('y', mu=x, observed=1)
posterior = pm.sample(draws=500, chains=4)
posterior_predictive = pm.sample_posterior_predictive(posterior)
print(posterior_predictive['y'].shape)
I propose this should print (2000,)
.
Current behavior is (500,)
, and the implementation is roughly:
indices = np.random.randint(0, len(trace), samples)
for idx in indices:
point = trace[idx]
ppc_samples = draw_values([var], point=point)
return np.array(ppc_samples)
Good things:
- Supports any number of samples
- They are asymptotically coming from the posterior predictive distribution
- Samples are perhaps less correlated
Bad things:
Ignores all but the first chain- Ignores some of the
first chain'sposterior samplestoo - Can not match posterior samples with ppc draws for diagnostics (this came up in arviz)
This would be a reasonably large breaking change! In a perfect world, I would want the return value to always be (4, 500)
, but if we make it so .reshape(4, 500)
matches up with the shape of the full trace. My suggestion is:
- Change the default to what I have described,
- Add a
shuffle
argument - If
samples
is bigger than 2000 andshuffle
isFalse
, cycle back over the posterior (in order).
Metadata
Metadata
Assignees
Labels
No labels