Sample Posterior Predictive Behavior

This is a proposal to update `sample_posterior_predictive` behavior. 

Consider:

```python
with pm.Model():
    x = pm.Normal('x')
    y = pm.Normal('y', mu=x, observed=1)
    posterior = pm.sample(draws=500, chains=4)
    posterior_predictive = pm.sample_posterior_predictive(posterior)

print(posterior_predictive['y'].shape)
```
I propose this should print `(2000,)`. 

Current behavior is `(500,)`, and the implementation is roughly:

```python
indices = np.random.randint(0, len(trace), samples)
for idx in indices:
    point = trace[idx]
    ppc_samples = draw_values([var], point=point)
return np.array(ppc_samples)
```

Good things:
- Supports any number of samples
- They are asymptotically coming from the posterior predictive distribution
- Samples are perhaps less correlated

Bad things:
- ~Ignores all but the first chain~
- Ignores some of the ~first chain's~ posterior samples ~too~
- Can not match posterior samples with ppc draws for diagnostics (this came up in arviz)

This would be a reasonably large breaking change! In a perfect world, I would want the return value to always be `(4, 500)`, but if we make it so `.reshape(4, 500)` matches up with the shape of the full trace. My suggestion is:
- Change the default to what I have described,
- Add a `shuffle` argument
- If `samples` is bigger than 2000 and `shuffle` is `False`, cycle back over the posterior (in order).


    

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sample Posterior Predictive Behavior #3208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Sample Posterior Predictive Behavior #3208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions