Skip to content

Sample init #1523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 18, 2016
83 changes: 25 additions & 58 deletions docs/source/notebooks/BEST.ipynb

Large diffs are not rendered by default.

99 changes: 44 additions & 55 deletions docs/source/notebooks/GLM-hierarchical.ipynb

Large diffs are not rendered by default.

100 changes: 36 additions & 64 deletions docs/source/notebooks/LKJ.ipynb

Large diffs are not rendered by default.

254 changes: 173 additions & 81 deletions docs/source/notebooks/NUTS_scaling_using_ADVI.ipynb

Large diffs are not rendered by default.

91 changes: 46 additions & 45 deletions docs/source/notebooks/cox_model.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -292,8 +292,9 @@
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
Expand All @@ -307,11 +308,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"widgets": {
"state": {},
"version": "1.1.2"
"version": "3.5.2"
}
},
"nbformat": 4,
Expand Down
95 changes: 47 additions & 48 deletions docs/source/notebooks/pmf-pymc.ipynb

Large diffs are not rendered by default.

113 changes: 58 additions & 55 deletions docs/source/notebooks/posterior_predictive.ipynb

Large diffs are not rendered by default.

138 changes: 31 additions & 107 deletions docs/source/notebooks/stochastic_volatility.ipynb

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions pymc3/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,16 @@ def as_iterargs(data):
else:
return [data]


def all_continuous(vars):
"""Check that vars not include discrete variables, excepting ObservedRVs.
"""
vars_ = [var for var in vars if not isinstance(var, pm.model.ObservedRV)]
if any([var.dtype in pm.discrete_types for var in vars_]):
return False
else:
return True

# theano stuff
theano.config.warn.sum_div_dimshuffle_bug = False
theano.config.compute_test_value = 'raise'
85 changes: 81 additions & 4 deletions pymc3/sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import sys
sys.setrecursionlimit(10000)

__all__ = ['sample', 'iter_sample', 'sample_ppc']
__all__ = ['sample', 'iter_sample', 'sample_ppc', 'init_nuts']


def assign_step_methods(model, step=None, methods=(NUTS, HamiltonianMC, Metropolis,
Expand Down Expand Up @@ -81,8 +81,9 @@ def assign_step_methods(model, step=None, methods=(NUTS, HamiltonianMC, Metropol
return steps


def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None,
progressbar=True, model=None, random_seed=-1):
def sample(draws, step=None, init='advi', n_init=500000, start=None,
trace=None, chain=0, njobs=1, tune=None, progressbar=True,
model=None, random_seed=-1):
"""
Draw a number of samples using the given step method.
Multiple step methods supported via compound step method
Expand All @@ -97,6 +98,15 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None
A step function or collection of functions. If no step methods are
specified, or are partially specified, they will be assigned
automatically (defaults to None).
init : str {'advi', 'advi_map', 'map', 'nuts'}
Initialization method to use.
* advi : Run ADVI to estimate posterior mean and diagonal covariance matrix.
* advi_map: Initialize ADVI with MAP and use MAP as starting point.
* map : Use the MAP as starting point.
* nuts : Run NUTS and estimate posterior mean and covariance matrix.
n_init : int
Number of iterations of initializer
If 'advi', number of iterations, if 'nuts', number of draws.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very clear docstring - good work :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about initializing using the defaults for each parameter (i.e. median, mode, etc.)? I would even argue that it should be the default, as it is the most general across models.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fonnesbeck Well that's what we do currently, no? And it's not working well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if advi is the default, and there are discrete variables, what happens? We don't want the default to fail for entire classes of model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init only matters for continuous models, if a model has discrete RVs the behavior doesn't change to before: #1523 (diff)

start : dict
Starting point in parameter space (or partial point)
Defaults to trace.point(-1)) if there is a trace provided and
Expand Down Expand Up @@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None
"""
model = modelcontext(model)

step = assign_step_methods(model, step)
if step is None and init is not None and pm.model.all_continuous(model.vars):
Copy link
Member

@ferrine ferrine Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to check pm.model.all_continuous(model.vars)?
Suppose init=='map' that does not require pm.model.all_continuous(model.vars) to be true then it will be rejected according to this if statement if discrete vars exist. There should be something like switching default: primary is advi, secondary is map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This really only handles the case of a continuous model where we can use NUTS. We might want to extend it for discrete models too. Not sure it should be part of this PR though.

# By default, use NUTS sampler
pm._log.info('Auto-assigning NUTS sampler...')
start_, step = init_nuts(init=init, n_init=n_init, model=model)
if start is None:
start = start_
else:
step = assign_step_methods(model, step)

if njobs is None:
import multiprocessing as mp
Expand Down Expand Up @@ -373,3 +390,63 @@ def sample_ppc(trace, samples=None, model=None, vars=None, size=None, random_see
size=size))

return {k: np.asarray(v) for k, v in ppc.items()}


def init_nuts(init='advi', n_init=500000, model=None):
"""Initialize and sample from posterior of a continuous model.

This is a convenience function. NUTS convergence and sampling speed is extremely
dependent on the choice of mass/scaling matrix. In our experience, using ADVI
to estimate a diagonal covariance matrix and using this as the scaling matrix
produces robust results over a wide class of continuous models.

Parameters
----------
init : str {'advi', 'advi_map', 'map', 'nuts'}
Initialization method to use.
* advi : Run ADVI to estimate posterior mean and diagonal covariance matrix.
* advi_map: Initialize ADVI with MAP and use MAP as starting point.
* map : Use the MAP as starting point.
* nuts : Run NUTS and estimate posterior mean and covariance matrix.
n_init : int
Number of iterations of initializer
If 'advi', number of iterations, if 'metropolis', number of draws.
model : Model (optional if in `with` context)

Returns
-------
start, nuts_sampler

start : pymc3.model.Point
Starting point for sampler
nuts_sampler : pymc3.step_methods.NUTS
Instantiated and initialized NUTS sampler object
"""

model = pm.modelcontext(model)

pm._log.info('Initializing NUTS using {}...'.format(init))

if init == 'advi':
v_params = pm.variational.advi(n=n_init)
start = pm.variational.sample_vp(v_params, 1)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki This new initialization is a very, very nice feature!

One issue I have observed is that the first few samples (for transformed variables) are very far away from the correct values. I think the problem is that currently pm.variational.sample_vp is returning non-transformed values. I guess start = pm.variational.sample_vp(v_params, 1, progressbar=False, hide_transformed=False)[0], should fix this problem. I also have a question, sampling from the variational posterior, instead of doing something like start = v_params.means, guarantees that when running parallel jobs n_jobs > 1 we get different starting points, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point re hide_transformed. Also, I think it will still use the same starting point because the parallelization happens down-stream of here. Need to return N samples where N is the number of chains / njobs. Want to help with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong about the starting points being the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes totally sense to need N samples. I will try to take a look at this ASAP.

cov = np.power(model.dict_to_array(v_params.stds), 2)
elif init == 'advi_map':
start = pm.find_MAP()
v_params = pm.variational.advi(n=n_init, start=start)
cov = np.power(model.dict_to_array(v_params.stds), 2)
elif init == 'map':
start = pm.find_MAP()
cov = pm.find_hessian(point=start)

elif init == 'nuts':
init_trace = pm.sample(step=pm.NUTS(), draws=n_init)
cov = pm.trace_cov(init_trace[n_init//2:])

start = {varname: np.mean(init_trace[varname]) for varname in init_trace.varnames}
else:
raise NotImplemented('Initializer {} is not supported.'.format(init))

step = pm.NUTS(scaling=cov, is_cov=True)

return start, step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convenience function looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this (or similar convenience function from within nuts? That way you can change whatever other params you want.

Plus, then we can make similar calls within other samplers.

its better to have intelligent defaults than have specialized initialization functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsalvatier I have considered it and I agree it would be better. The main reason is that ADVI not only provides us with the scaling but also the starting point (i.e. a sample from the posterior). Not sure how to do that if it lives inside NUTS.

Copy link
Member

@jsalvatier jsalvatier Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly how to do it, but I'm sure its possible to do in a semi-nice way.

(if you're interested in copying my skill at making elegant things, I think its exactly my habit of seeing that doing it this way would be nicer/less opaque and then going and spending a bunch of time thinking about it that generated that skill. I think its probably worth it.).

If you're interested in gaining that skill, I can let you think about it for a while. (I can think about it instead if you want, though. don't mean to strong-arm you).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we both think about it and see what we come up with.

4 changes: 2 additions & 2 deletions pymc3/tests/test_diagnostics.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ def get_ptrace(self, n_samples):
# Run sampler
step1 = Slice([model.early_mean_log_, model.late_mean_log_])
step2 = Metropolis([model.switchpoint])
start = {'early_mean': 2., 'late_mean': 3., 'switchpoint': 50}
start = {'early_mean': 2., 'late_mean': 3., 'switchpoint': 90}
ptrace = sample(n_samples, [step1, step2], start, njobs=2, progressbar=False,
random_seed=[1, 3])
random_seed=[1, 4])
return ptrace

def test_good(self):
Expand Down
8 changes: 8 additions & 0 deletions pymc3/tests/test_sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,14 @@ def test_sample(self):
for steps in [1, 10, 300]:
pm.sample(steps, self.step, {}, None, njobs=njobs, random_seed=self.random_seed)

def test_sample_init(self):
with self.model:
for init in ('advi', 'advi_map', 'map', 'nuts'):
pm.sample(init=init,
n_init=1000, draws=50,
random_seed=self.random_seed)


def test_iter_sample(self):
with self.model:
samps = pm.sampling.iter_sample(5, self.step, self.start, random_seed=self.random_seed)
Expand Down
9 changes: 6 additions & 3 deletions pymc3/tuning/scaling.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def eig_recompose(val, vec):
return vec.dot(np.diag(val)).dot(vec.T)


def trace_cov(trace, vars=None):
def trace_cov(trace, vars=None, model=None):
"""
Calculate the flattened covariance matrix using a sample trace

Expand All @@ -155,9 +155,12 @@ def trace_cov(trace, vars=None):
r : array (n,n)
covariance matrix
"""
model = modelcontext(model)

if vars is None:
vars = trace.samples.keys
if model is not None:
vars = model.free_RVs
elif vars is None:
vars = trace.varnames

def flat_t(var):
x = trace[str(var)]
Expand Down
3 changes: 2 additions & 1 deletion pymc3/variational/advi.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,8 @@ def advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False,
vars = model.vars
vars = pm.inputvars(vars)

check_discrete_rvs(vars)
if not pm.model.all_continuous(vars):
raise ValueError('Model should not include discrete RVs for ADVI.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, as a good test and helps the user use ADVI correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think little warnings like this make things like ADVI more 'discoverable'. I.e. without reading the documentation you will realize that this 'fancy method' just doesn't work with discrete RVs. So you should use another method.


n_mcsamples = 100 if accurate_elbo else 1

Expand Down