-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Sample init #1523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample init #1523
Conversation
I think the API is fine for the moment, it's a natural way to sample_init. It's a natural API on top of Metropolis and ADVI, easier than the old way. @fonnesbeck @ColCarroll @AustinRochford could one of you three review this as well in case I missed something. I think the motivation is clear, and it'll be clear from the documentation that this is a good thing to use, to get over some of the slowness problems we got pinged at us on Twitter :) |
I can't wait for this API to be merged in! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Have only a few nitpicks that can be changed before or after a merge
to estimate a diagonal covariance matrix and using this as the scaling matrix | ||
produces robust results over a wide class of continuous models. | ||
|
||
Parameteres |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo!
if init != 'advi': | ||
raise ValueError("To sample via ADVI, you have to set init='advi'.") | ||
trace = pm.variational.sample_vp(v_params, draws=draws) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should sampler
instead mimic the step
argument in sample
, in that it accepts a function that can sample? That would elminate some messy string parsing.
Sampler to use. Will be initialized using init algorithm. | ||
* nuts : Run NUTS sampler with the init covariance estimation as the scaling matrix. | ||
* hmc : Run HamiltonianMC sampler with the init covariance estimation as the scaling matrix. | ||
* advi : Sample from variational posterior, requires init='advi'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the model
kwarg be documented?
raise ValueError("To sample via ADVI, you have to set init='advi'.") | ||
trace = pm.variational.sample_vp(v_params, draws=draws) | ||
else: | ||
trace = pm.sample(step=step, start=start, draws=draws, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this call have the model
kwarg passed in?
Is it reasonable to |
@ColCarroll What do you think about the |
@twiecki Looks like if the nitpicks are resolved this is ready to go! |
@ferrine @springcoil personally, I think kwargs are hard because they require very good documentation and testing to maintain, and often lead to cryptic errors or (worse) silent errors. That being said, this is a general interface, so maybe they're necessary. |
Looks great! |
Why not just extend So, I'm thinking about something like this:
|
@fonnesbeck I like that idea overall. One difficulty I see is that this initialization approach limits the acceptable step methods, so I'm a bit torn. |
Thanks for the feedback. @fonnesbeck In your proposal, what would happen if the user specified step-methods incompatible with ADVI initialization? We would also lose the feature of sampling via ADVI. Essentially, this adds a new top-level API and degrades |
I agree with Thomas. That certainly fits my mental model of the API/ and how to initialize sampling. |
On second thought, |
@twiecki that's what I was thinking as well. If the model is incompatible with ADVI (notably, discrete variables in the model), we can simply halt the model and return a message with that information. |
…e_init() as per @fonnesbeck suggestion.
OK, I have refactored this PR taking all feedback into account. Specifically, @fonnesbeck suggestion I think is absolutely correct, the code isn't as bad as I thought and it makes the API really simple and requires no cognitive overhead. Please take another look. Also, do not merge as the examples and tests still need to be updated in case we're happy with this. |
On third thought, maybe we should push init even further down into the |
@ferrine You're right, I hadn't considered that. It thus seems that this implementation is at the right level. Should I go ahead and finish the PR up or do people have suggestions at the API level? |
@twiecki I think there's been enough discussion and that you should wrap up this PR. I think it's been interesting to see the different level of abstractions proposed and I think your implementation is about right. |
* nuts : Run NUTS and estimate posterior mean and covariance matrix. | ||
n_init : int | ||
Number of iterations of initializer | ||
If 'advi', number of iterations, if 'nuts', number of draws. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very clear docstring - good work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about initializing using the defaults for each parameter (i.e. median, mode, etc.)? I would even argue that it should be the default, as it is the most general across models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fonnesbeck Well that's what we do currently, no? And it's not working well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if advi
is the default, and there are discrete variables, what happens? We don't want the default to fail for entire classes of model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init
only matters for continuous models, if a model has discrete RVs the behavior doesn't change to before: #1523 (diff)
|
||
step = pm.NUTS(scaling=cov, is_cov=True) | ||
|
||
return start, step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This convenience function looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call this (or similar convenience function from within nuts? That way you can change whatever other params you want.
Plus, then we can make similar calls within other samplers.
its better to have intelligent defaults than have specialized initialization functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsalvatier I have considered it and I agree it would be better. The main reason is that ADVI not only provides us with the scaling but also the starting point (i.e. a sample from the posterior). Not sure how to do that if it lives inside NUTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly how to do it, but I'm sure its possible to do in a semi-nice way.
(if you're interested in copying my skill at making elegant things, I think its exactly my habit of seeing that doing it this way would be nicer/less opaque and then going and spending a bunch of time thinking about it that generated that skill. I think its probably worth it.).
If you're interested in gaining that skill, I can let you think about it for a while. (I can think about it instead if you want, though. don't mean to strong-arm you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we both think about it and see what we come up with.
@@ -110,7 +110,8 @@ def advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False, | |||
vars = model.vars | |||
vars = pm.inputvars(vars) | |||
|
|||
check_discrete_rvs(vars) | |||
if not pm.model.all_continuous(vars): | |||
raise ValueError('Model should not include discrete RVs for ADVI.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, as a good test and helps the user use ADVI correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think little warnings like this make things like ADVI more 'discoverable'. I.e. without reading the documentation you will realize that this 'fancy method' just doesn't work with discrete RVs. So you should use another method.
@@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None | |||
""" | |||
model = modelcontext(model) | |||
|
|||
step = assign_step_methods(model, step) | |||
if step is None and init is not None and pm.model.all_continuous(model.vars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to check pm.model.all_continuous(model.vars)
?
Suppose init=='map'
that does not require pm.model.all_continuous(model.vars)
to be true then it will be rejected according to this if statement if discrete vars exist. There should be something like switching default: primary is advi, secondary is map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This really only handles the case of a continuous model where we can use NUTS. We might want to extend it for discrete models too. Not sure it should be part of this PR though.
I don't think it should be part of this PR. On Thu, Nov 17, 2016 at 12:23 PM, Thomas Wiecki [email protected]
Peadar Coyle |
Huh, no idea why
is falling. |
I haven't looked closely, but is it possible the sample initialization is running for this test? It relies on sampling not running long enough to converge, but IIRC when I was tuning it, the distribution it samples from converges quite quickly (which is why there are only 10 samples). I can take a closer look tonight. |
That's what I thought initially but it doesn't seem to run the initialization (nor should it). |
Could it be that a better initialization has made the example converge much faster, and hence the negative assertion fails? |
As I said, it's not using the initialization as far as I can tell. |
If I add |
Is this too early to merge? |
No, lets wait a bit for others to also take another look, but then I think we can merge. |
@twiecki I like the design you ended up with very much 👍 |
I agree. This is great. |
Thanks guys! |
This is super rad. Thanks for doing this guys. |
|
||
if init == 'advi': | ||
v_params = pm.variational.advi(n=n_init) | ||
start = pm.variational.sample_vp(v_params, 1)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twiecki This new initialization is a very, very nice feature!
One issue I have observed is that the first few samples (for transformed variables) are very far away from the correct values. I think the problem is that currently pm.variational.sample_vp
is returning non-transformed values. I guess start = pm.variational.sample_vp(v_params, 1, progressbar=False, hide_transformed=False)[0]
, should fix this problem. I also have a question, sampling from the variational posterior, instead of doing something like start = v_params.means
, guarantees that when running parallel jobs n_jobs > 1
we get different starting points, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point re hide_transformed. Also, I think it will still use the same starting point because the parallelization happens down-stream of here. Need to return N samples where N is the number of chains / njobs. Want to help with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong about the starting points being the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it makes totally sense to need N samples. I will try to take a look at this ASAP.
This adds a
sample_init()
function which simplifies sampling and proper initialization. The main motivation comes from many users claiming nuts to be slow or to be stuck. ADVI helps with this in all cases I encountered. This combination of ADVI init + NUTS is the default here. Other use-cases include sampling directly from ADVI if ADVI is selected as the sampler.I also changed some examples to demonstrate how it can be used.
Feedback on API and naming would be especially helpful.
Addresses #1512.