-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Sample init #1523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample init #1523
Changes from all commits
1770be2
77b67d0
dc203d9
85b46f6
e3c8b63
4d53a36
85e0f9d
34e3f45
59f500c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ | |
import sys | ||
sys.setrecursionlimit(10000) | ||
|
||
__all__ = ['sample', 'iter_sample', 'sample_ppc'] | ||
__all__ = ['sample', 'iter_sample', 'sample_ppc', 'init_nuts'] | ||
|
||
|
||
def assign_step_methods(model, step=None, methods=(NUTS, HamiltonianMC, Metropolis, | ||
|
@@ -81,8 +81,9 @@ def assign_step_methods(model, step=None, methods=(NUTS, HamiltonianMC, Metropol | |
return steps | ||
|
||
|
||
def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None, | ||
progressbar=True, model=None, random_seed=-1): | ||
def sample(draws, step=None, init='advi', n_init=500000, start=None, | ||
trace=None, chain=0, njobs=1, tune=None, progressbar=True, | ||
model=None, random_seed=-1): | ||
""" | ||
Draw a number of samples using the given step method. | ||
Multiple step methods supported via compound step method | ||
|
@@ -97,6 +98,15 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None | |
A step function or collection of functions. If no step methods are | ||
specified, or are partially specified, they will be assigned | ||
automatically (defaults to None). | ||
init : str {'advi', 'advi_map', 'map', 'nuts'} | ||
Initialization method to use. | ||
* advi : Run ADVI to estimate posterior mean and diagonal covariance matrix. | ||
* advi_map: Initialize ADVI with MAP and use MAP as starting point. | ||
* map : Use the MAP as starting point. | ||
* nuts : Run NUTS and estimate posterior mean and covariance matrix. | ||
n_init : int | ||
Number of iterations of initializer | ||
If 'advi', number of iterations, if 'nuts', number of draws. | ||
start : dict | ||
Starting point in parameter space (or partial point) | ||
Defaults to trace.point(-1)) if there is a trace provided and | ||
|
@@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None | |
""" | ||
model = modelcontext(model) | ||
|
||
step = assign_step_methods(model, step) | ||
if step is None and init is not None and pm.model.all_continuous(model.vars): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need to check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. This really only handles the case of a continuous model where we can use NUTS. We might want to extend it for discrete models too. Not sure it should be part of this PR though. |
||
# By default, use NUTS sampler | ||
pm._log.info('Auto-assigning NUTS sampler...') | ||
start_, step = init_nuts(init=init, n_init=n_init, model=model) | ||
if start is None: | ||
start = start_ | ||
else: | ||
step = assign_step_methods(model, step) | ||
|
||
if njobs is None: | ||
import multiprocessing as mp | ||
|
@@ -373,3 +390,63 @@ def sample_ppc(trace, samples=None, model=None, vars=None, size=None, random_see | |
size=size)) | ||
|
||
return {k: np.asarray(v) for k, v in ppc.items()} | ||
|
||
|
||
def init_nuts(init='advi', n_init=500000, model=None): | ||
"""Initialize and sample from posterior of a continuous model. | ||
|
||
This is a convenience function. NUTS convergence and sampling speed is extremely | ||
dependent on the choice of mass/scaling matrix. In our experience, using ADVI | ||
to estimate a diagonal covariance matrix and using this as the scaling matrix | ||
produces robust results over a wide class of continuous models. | ||
|
||
Parameters | ||
---------- | ||
init : str {'advi', 'advi_map', 'map', 'nuts'} | ||
Initialization method to use. | ||
* advi : Run ADVI to estimate posterior mean and diagonal covariance matrix. | ||
* advi_map: Initialize ADVI with MAP and use MAP as starting point. | ||
* map : Use the MAP as starting point. | ||
* nuts : Run NUTS and estimate posterior mean and covariance matrix. | ||
n_init : int | ||
Number of iterations of initializer | ||
If 'advi', number of iterations, if 'metropolis', number of draws. | ||
model : Model (optional if in `with` context) | ||
|
||
Returns | ||
------- | ||
start, nuts_sampler | ||
|
||
start : pymc3.model.Point | ||
Starting point for sampler | ||
nuts_sampler : pymc3.step_methods.NUTS | ||
Instantiated and initialized NUTS sampler object | ||
""" | ||
|
||
model = pm.modelcontext(model) | ||
|
||
pm._log.info('Initializing NUTS using {}...'.format(init)) | ||
|
||
if init == 'advi': | ||
v_params = pm.variational.advi(n=n_init) | ||
start = pm.variational.sample_vp(v_params, 1)[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @twiecki This new initialization is a very, very nice feature! One issue I have observed is that the first few samples (for transformed variables) are very far away from the correct values. I think the problem is that currently There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, good point re hide_transformed. Also, I think it will still use the same starting point because the parallelization happens down-stream of here. Need to return N samples where N is the number of chains / njobs. Want to help with that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could be wrong about the starting points being the same. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, it makes totally sense to need N samples. I will try to take a look at this ASAP. |
||
cov = np.power(model.dict_to_array(v_params.stds), 2) | ||
elif init == 'advi_map': | ||
start = pm.find_MAP() | ||
v_params = pm.variational.advi(n=n_init, start=start) | ||
cov = np.power(model.dict_to_array(v_params.stds), 2) | ||
elif init == 'map': | ||
start = pm.find_MAP() | ||
cov = pm.find_hessian(point=start) | ||
|
||
elif init == 'nuts': | ||
init_trace = pm.sample(step=pm.NUTS(), draws=n_init) | ||
cov = pm.trace_cov(init_trace[n_init//2:]) | ||
|
||
start = {varname: np.mean(init_trace[varname]) for varname in init_trace.varnames} | ||
else: | ||
raise NotImplemented('Initializer {} is not supported.'.format(init)) | ||
|
||
step = pm.NUTS(scaling=cov, is_cov=True) | ||
|
||
return start, step | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This convenience function looks good to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not call this (or similar convenience function from within nuts? That way you can change whatever other params you want. Plus, then we can make similar calls within other samplers. its better to have intelligent defaults than have specialized initialization functions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jsalvatier I have considered it and I agree it would be better. The main reason is that ADVI not only provides us with the scaling but also the starting point (i.e. a sample from the posterior). Not sure how to do that if it lives inside NUTS. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure exactly how to do it, but I'm sure its possible to do in a semi-nice way. (if you're interested in copying my skill at making elegant things, I think its exactly my habit of seeing that doing it this way would be nicer/less opaque and then going and spending a bunch of time thinking about it that generated that skill. I think its probably worth it.). If you're interested in gaining that skill, I can let you think about it for a while. (I can think about it instead if you want, though. don't mean to strong-arm you). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why don't we both think about it and see what we come up with. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -110,7 +110,8 @@ def advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False, | |
vars = model.vars | ||
vars = pm.inputvars(vars) | ||
|
||
check_discrete_rvs(vars) | ||
if not pm.model.all_continuous(vars): | ||
raise ValueError('Model should not include discrete RVs for ADVI.') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks good to me, as a good test and helps the user use ADVI correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think little warnings like this make things like ADVI more 'discoverable'. I.e. without reading the documentation you will realize that this 'fancy method' just doesn't work with discrete RVs. So you should use another method. |
||
|
||
n_mcsamples = 100 if accurate_elbo else 1 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very clear docstring - good work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about initializing using the defaults for each parameter (i.e. median, mode, etc.)? I would even argue that it should be the default, as it is the most general across models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fonnesbeck Well that's what we do currently, no? And it's not working well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if
advi
is the default, and there are discrete variables, what happens? We don't want the default to fail for entire classes of model.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init
only matters for continuous models, if a model has discrete RVs the behavior doesn't change to before: #1523 (diff)