Skip to content

Sample Posterior Predictive Behavior #3208

Closed
@ColCarroll

Description

@ColCarroll

This is a proposal to update sample_posterior_predictive behavior.

Consider:

with pm.Model():
    x = pm.Normal('x')
    y = pm.Normal('y', mu=x, observed=1)
    posterior = pm.sample(draws=500, chains=4)
    posterior_predictive = pm.sample_posterior_predictive(posterior)

print(posterior_predictive['y'].shape)

I propose this should print (2000,).

Current behavior is (500,), and the implementation is roughly:

indices = np.random.randint(0, len(trace), samples)
for idx in indices:
    point = trace[idx]
    ppc_samples = draw_values([var], point=point)
return np.array(ppc_samples)

Good things:

  • Supports any number of samples
  • They are asymptotically coming from the posterior predictive distribution
  • Samples are perhaps less correlated

Bad things:

  • Ignores all but the first chain
  • Ignores some of the first chain's posterior samples too
  • Can not match posterior samples with ppc draws for diagnostics (this came up in arviz)

This would be a reasonably large breaking change! In a perfect world, I would want the return value to always be (4, 500), but if we make it so .reshape(4, 500) matches up with the shape of the full trace. My suggestion is:

  • Change the default to what I have described,
  • Add a shuffle argument
  • If samples is bigger than 2000 and shuffle is False, cycle back over the posterior (in order).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions