Skip to content

Warn users about discrete parameters with nuts or advi #1902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aseyboldt opened this issue Mar 15, 2017 · 6 comments
Closed

Warn users about discrete parameters with nuts or advi #1902

aseyboldt opened this issue Mar 15, 2017 · 6 comments

Comments

@aseyboldt
Copy link
Member

Right now we allow discrete parameters in models when sampling using nuts, and assign a metropolis step for those parameters. I can't see why this would work well even for moderate models sizes (every change in one of the discrete parameters will probably send us out of the typical set for the continuous ones, right?). Or do you know models where this works? If not, I would propose to print at least a warning if a user tries to do this. For an example where I think this lead to a problem, see this SO question

@fonnesbeck
Copy link
Member

fonnesbeck commented Mar 15, 2017

This generally works for me, so I do use it with NUTS. By "works" I mean that I get the same answer as when I switch over to a pure Metropolis sampling scheme. I typically only use it when there is only one or two non-continuous parameters. As to when it stops working, I have no idea. It would be an interesting research question.

I don't think it would even run at all with ADVI, would it? It certainly would not happen automatically, since ADVI does not use sample.

@aseyboldt
Copy link
Member Author

No, initialization using advi seems to be disabled silently.

@aseyboldt
Copy link
Member Author

The problem I see with this is that it is very easy for a new user to put some discrete parameter in a model, which may really mess up convergence. And if you don't know much about how the samplers work it is not easy to see why this is a problem. It also does not make it easier that a lot of valid models contain observed discrete RVs.

@twiecki
Copy link
Member

twiecki commented Mar 15, 2017

@aseyboldt What would be the particular problem of being moved outside the typical set? Wouldn't that also happen if you mix other samplers?

@fonnesbeck
Copy link
Member

I think its an open question. I mentioned it to the Stan guys once and they did not seem to have an intuition about it either.

@aseyboldt
Copy link
Member Author

aseyboldt commented Mar 30, 2017

Sorry, I forgot about this issue.

I certainly can't claim to have a good understanding of this, but here is what I had in mind when I mentioned the typical sets. Lets say we have a distribution with one binary parameter $\alpha$ and a couple of continuous parameters $\theta$. Also, to keep it simple $P(\alpha=0) = P(\alpha=1)$. If the typical sets of $\theta | \alpha=0$ and $\theta | \alpha=1$ don't overlap much, then the sampler can't switch between the states for alpha, and gets stuck at one of the values. My understanding is that this could happen pretty quickly if a number of values in $\theta$ depend on $\alpha$ – even if all the typical sets of $\theta_i | \alpha=0, \theta_{\ne i}$ and $\theta_i | \alpha=1, \theta_{\ne i}$ overlap. For example:

N = 100
with pm.Model() as model:
    alpha = pm.Bernoulli('alpha', p=0.5)
    pm.Normal('theta', mu=alpha, sd=1, shape=N)

If N is 1, we can sample from this easily. But for N=100 alpha is stuck at 0 for the whole trace.
(Of course we could reparameterize y as N(0, 1) + alpha and avoid that problem, but still...)

I don't know how much of a problem this is in real world applications. I just remember running into some trouble like this some time ago, but can't even find the model anymore.

Edit: Ahhhh, no latex on github...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants