Skip to content

Update posterior_predictive notebook #2853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2018

Conversation

ColCarroll
Copy link
Member

Reran with @junpenglao's styles from #2834, and also updated grammar, style, and links a bit (we quote extensively from an Edward tutorial which was removed).

@junpenglao
Copy link
Member

Looks good, nitpicks:

  • print('Runing on PyMC3 v%s'%pm.__version__) to have nice output
  • It always bothers me that in the last figure the errorbar goes outside of the [0, 1], could you use confidence interval instead?

@ColCarroll
Copy link
Member Author

Can do – that's a funny example because the ppc is just sampling outcomes, not probabilities. I would almost rather put a disc, scaled by number of samples, at y=0 and y=1 to view these samples.

@junpenglao
Copy link
Member

Agree - for discrete observed RVs I also prefer to sample the latent continuous parameters. Currently it is not straightforward to do: you either need to write the forward function yourself or use draw_value (which is pretty slow). I am hoping it can be addressed when somebody pick up our 2018 GSOC to implement the likelihood-free inference.

@aloctavodia
Copy link
Member

Just a detail, but we can avoid importing seaborn, if we use

ax.hist([n.mean() for n in ppc['n']], alpha=0.5)

instead of

sns.distplot([n.mean() for n in ppc['n']], kde=False, ax=ax)

@fonnesbeck
Copy link
Member

Seaborn plots look a lot nicer than MPL though

@aloctavodia
Copy link
Member

But we will still use plt.style.use('seaborn-darkgrid'), so we get the seaborn's look.

with

ax.hist([n.mean() for n in ppc['n']], alpha=0.5)

plt_hist

and with

sns.distplot([n.mean() for n in ppc['n']], kde=False, ax=ax)

sns_distplot

not exactly the same (mainly due to the different number of bins) but pretty similar.

@fonnesbeck
Copy link
Member

Fair enough, in this case. In general seaborn generates nicer plots with less code, but here I agree there is no real difference.

@ColCarroll
Copy link
Member Author

Any strong opinions on the last image, then? I prefer the below, along with a more descriptive discussion that this is one measure of centrality you could do with sample_ppc.

image

@junpenglao
Copy link
Member

I prefer some errorbar tho

@ColCarroll
Copy link
Member Author

There are two problems with the errorbar that are different levels of difficult:

1.I don't think there's a numpy built-in for Beta error bars, and I am lazy scipy.stats.beta.interval! Undocumented, but passing 0.9 appears to give a 90% confidence interval.

  1. The error bar is recording the MCMC error, right? I am not sure where else there is uncertainty in this model. Here's an example with a 90% confidence interval, and the "true" value of p:

image

@junpenglao
Copy link
Member

The error bar is recording the MCMC error, right?

If you are ploting the error of the mean, yes. But you can also show the [2.5, 97.5] quantile to represent uncertainty right?

@ColCarroll
Copy link
Member Author

Hrm... I think I don't understand where you would calculate that. It seems like there is only uncertainty in the Binomial variable here. How would you calculate the [2.5, 97.5] quantile?

@junpenglao
Copy link
Member

What I had in mind is:

samples = 500
post_coeff = trace['coeff']
nsample = len(post_coeff)
postp = tinvlogit(post_coeff[np.random.randint(
    0, nsample, samples)][:, np.newaxis] * predictors_oos).eval()

ymean = postp.mean(axis=0)
yerr = np.percentile(postp, [2.5, 97.5], axis=0)
yerr[0, :] = ymean - yerr[0, :]
yerr[1, :] = yerr[1, :] - ymean

plt.errorbar(
    x=predictors_oos,
    y=ymean,
    yerr=yerr,
    linestyle='',
    marker='o')
plt.plot(predictors_oos, outcomes_oos, 'o')
plt.ylim(-.05, 1.05)
plt.xlabel('predictor')
plt.ylabel('outcome')

But you can barely see any uncertainty... I guess in this simple example this is not much point to do
image

Last nitpick: there is an unnecessary n_init in the last pm.sample()

@ColCarroll
Copy link
Member Author

Just updated – I think I did the equivalent of your suggestion (at least in expectation!) The big difference is that I lowered the number of ppc samples to 100 so the error bars were more prominent.

screen shot 2018-02-12 at 3 50 48 pm

@junpenglao junpenglao merged commit ffc0608 into pymc-devs:master Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants