Skip to content

Sample init #1523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 18, 2016
Merged

Sample init #1523

merged 9 commits into from
Nov 18, 2016

Conversation

twiecki
Copy link
Member

@twiecki twiecki commented Nov 12, 2016

This adds a sample_init() function which simplifies sampling and proper initialization. The main motivation comes from many users claiming nuts to be slow or to be stuck. ADVI helps with this in all cases I encountered. This combination of ADVI init + NUTS is the default here. Other use-cases include sampling directly from ADVI if ADVI is selected as the sampler.

I also changed some examples to demonstrate how it can be used.

Feedback on API and naming would be especially helpful.

Addresses #1512.

This was referenced Nov 12, 2016
@springcoil
Copy link
Contributor

springcoil commented Nov 12, 2016

I think the API is fine for the moment, it's a natural way to sample_init. It's a natural API on top of Metropolis and ADVI, easier than the old way.

@fonnesbeck @ColCarroll @AustinRochford could one of you three review this as well in case I missed something.

I think the motivation is clear, and it'll be clear from the documentation that this is a good thing to use, to get over some of the slowness problems we got pinged at us on Twitter :)

@ericmjl
Copy link
Member

ericmjl commented Nov 12, 2016

I can't wait for this API to be merged in!

Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Have only a few nitpicks that can be changed before or after a merge

to estimate a diagonal covariance matrix and using this as the scaling matrix
produces robust results over a wide class of continuous models.

Parameteres
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo!

if init != 'advi':
raise ValueError("To sample via ADVI, you have to set init='advi'.")
trace = pm.variational.sample_vp(v_params, draws=draws)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should sampler instead mimic the step argument in sample, in that it accepts a function that can sample? That would elminate some messy string parsing.

Sampler to use. Will be initialized using init algorithm.
* nuts : Run NUTS sampler with the init covariance estimation as the scaling matrix.
* hmc : Run HamiltonianMC sampler with the init covariance estimation as the scaling matrix.
* advi : Sample from variational posterior, requires init='advi'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the model kwarg be documented?

raise ValueError("To sample via ADVI, you have to set init='advi'.")
trace = pm.variational.sample_vp(v_params, draws=draws)
else:
trace = pm.sample(step=step, start=start, draws=draws, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this call have the model kwarg passed in?

@ferrine
Copy link
Member

ferrine commented Nov 13, 2016

Is it reasonable to initkwargs and samplekwargs for better customising. And is it possible to support advi_minibatch here?

@springcoil
Copy link
Contributor

@ColCarroll What do you think about the initkwargs and samplekwargs. I think we can push that back to another PR.

@springcoil
Copy link
Contributor

@twiecki Looks like if the nitpicks are resolved this is ready to go!

@ColCarroll
Copy link
Member

@ferrine @springcoil personally, I think kwargs are hard because they require very good documentation and testing to maintain, and often lead to cryptic errors or (worse) silent errors. That being said, this is a general interface, so maybe they're necessary.

@AustinRochford
Copy link
Member

Looks great!

@fonnesbeck
Copy link
Member

fonnesbeck commented Nov 13, 2016

Why not just extend sample to have an init='advi' argument? Seems simpler than having an entirely different function just for an initialization option. I'd go further and also silence the ADVI output when used as an initialization, unless explicitly asked for. Its the sampling we care about; the initialization should not be the focus.

So, I'm thinking about something like this:

with model:
    trace = pm.sample(2000, init='advi')

Initializing using advi...
Sampling using NUTS...
100%|██████████| 2000/2000 [00:10<00:00, 182.29it/s]

@AustinRochford
Copy link
Member

@fonnesbeck I like that idea overall. One difficulty I see is that this initialization approach limits the acceptable step methods, so I'm a bit torn.

@twiecki twiecki mentioned this pull request Nov 14, 2016
@twiecki
Copy link
Member Author

twiecki commented Nov 14, 2016

Thanks for the feedback.

@fonnesbeck In your proposal, what would happen if the user specified step-methods incompatible with ADVI initialization? We would also lose the feature of sampling via ADVI. Essentially, this adds a new top-level API and degrades sample to be more low-level. I feel like that makes sense but it's also true that it might not be clear to a user if there are multiple ways to do the same thing.

@springcoil
Copy link
Contributor

I agree with Thomas. That certainly fits my mental model of the API/ and how to initialize sampling.

@twiecki
Copy link
Member Author

twiecki commented Nov 14, 2016

On second thought, sample_init() can really only be used on continuous models (as the doc-string says). But it's probably also confusing to use high-level sample_init() for some models, but sample() for others. sample() is already doing some magic by auto-assigning step-methods and we can add more magic by auto-initializing (if it's continuous, use ADVI etc).

@fonnesbeck
Copy link
Member

@twiecki that's what I was thinking as well.

If the model is incompatible with ADVI (notably, discrete variables in the model), we can simply halt the model and return a message with that information.

@twiecki twiecki changed the title Sample init WIP Sample init Nov 16, 2016
@twiecki
Copy link
Member Author

twiecki commented Nov 16, 2016

OK, I have refactored this PR taking all feedback into account. Specifically, @fonnesbeck suggestion I think is absolutely correct, the code isn't as bad as I thought and it makes the API really simple and requires no cognitive overhead. Please take another look.

Also, do not merge as the examples and tests still need to be updated in case we're happy with this.

@twiecki
Copy link
Member Author

twiecki commented Nov 16, 2016

On third thought, maybe we should push init even further down into the NUTS class itself. There, we're already doing estimation of the mass matrix using the hessian so it would fit right in.

@twiecki
Copy link
Member Author

twiecki commented Nov 17, 2016

@ferrine You're right, I hadn't considered that. It thus seems that this implementation is at the right level.

Should I go ahead and finish the PR up or do people have suggestions at the API level?

@twiecki twiecki added the WIP label Nov 17, 2016
@springcoil
Copy link
Contributor

@twiecki I think there's been enough discussion and that you should wrap up this PR. I think it's been interesting to see the different level of abstractions proposed and I think your implementation is about right.

* nuts : Run NUTS and estimate posterior mean and covariance matrix.
n_init : int
Number of iterations of initializer
If 'advi', number of iterations, if 'nuts', number of draws.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very clear docstring - good work :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about initializing using the defaults for each parameter (i.e. median, mode, etc.)? I would even argue that it should be the default, as it is the most general across models.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fonnesbeck Well that's what we do currently, no? And it's not working well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if advi is the default, and there are discrete variables, what happens? We don't want the default to fail for entire classes of model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init only matters for continuous models, if a model has discrete RVs the behavior doesn't change to before: #1523 (diff)


step = pm.NUTS(scaling=cov, is_cov=True)

return start, step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convenience function looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this (or similar convenience function from within nuts? That way you can change whatever other params you want.

Plus, then we can make similar calls within other samplers.

its better to have intelligent defaults than have specialized initialization functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsalvatier I have considered it and I agree it would be better. The main reason is that ADVI not only provides us with the scaling but also the starting point (i.e. a sample from the posterior). Not sure how to do that if it lives inside NUTS.

Copy link
Member

@jsalvatier jsalvatier Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly how to do it, but I'm sure its possible to do in a semi-nice way.

(if you're interested in copying my skill at making elegant things, I think its exactly my habit of seeing that doing it this way would be nicer/less opaque and then going and spending a bunch of time thinking about it that generated that skill. I think its probably worth it.).

If you're interested in gaining that skill, I can let you think about it for a while. (I can think about it instead if you want, though. don't mean to strong-arm you).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we both think about it and see what we come up with.

@@ -110,7 +110,8 @@ def advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False,
vars = model.vars
vars = pm.inputvars(vars)

check_discrete_rvs(vars)
if not pm.model.all_continuous(vars):
raise ValueError('Model should not include discrete RVs for ADVI.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, as a good test and helps the user use ADVI correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think little warnings like this make things like ADVI more 'discoverable'. I.e. without reading the documentation you will realize that this 'fancy method' just doesn't work with discrete RVs. So you should use another method.

@@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None
"""
model = modelcontext(model)

step = assign_step_methods(model, step)
if step is None and init is not None and pm.model.all_continuous(model.vars):
Copy link
Member

@ferrine ferrine Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to check pm.model.all_continuous(model.vars)?
Suppose init=='map' that does not require pm.model.all_continuous(model.vars) to be true then it will be rejected according to this if statement if discrete vars exist. There should be something like switching default: primary is advi, secondary is map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This really only handles the case of a continuous model where we can use NUTS. We might want to extend it for discrete models too. Not sure it should be part of this PR though.

@springcoil
Copy link
Contributor

I don't think it should be part of this PR.
Maybe open a new one for the future.

On Thu, Nov 17, 2016 at 12:23 PM, Thomas Wiecki [email protected]
wrote:

@twiecki commented on this pull request.

In pymc3/sampling.py #1523:

@@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None
"""
model = modelcontext(model)

  • step = assign_step_methods(model, step)
  • if step is None and init is not None and pm.model.all_continuous(model.vars):

Good point. This really only handles the case of a continuous model where
we can use NUTS. We might want to extend it for discrete models too. Not
sure it should be part of this PR though.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1523, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA8DiCR8HyiwALAV1S1rcWfl4Mkyx_84ks5q_EdCgaJpZM4KwYD7
.

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

@twiecki
Copy link
Member Author

twiecki commented Nov 17, 2016

Huh, no idea why

======================================================================
FAIL: Confirm Gelman-Rubin statistic is far from 1 for a small number of samples.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wiecki/working/projects/pymc/pymc3/tests/test_diagnostics.py", line 40, in test_bad
    self.good_ratio for r in rhat.values()))
AssertionError: True is not false

is falling.

@ColCarroll
Copy link
Member

I haven't looked closely, but is it possible the sample initialization is running for this test? It relies on sampling not running long enough to converge, but IIRC when I was tuning it, the distribution it samples from converges quite quickly (which is why there are only 10 samples). I can take a closer look tonight.

@twiecki
Copy link
Member Author

twiecki commented Nov 17, 2016

That's what I thought initially but it doesn't seem to run the initialization (nor should it).

@fonnesbeck
Copy link
Member

Could it be that a better initialization has made the example converge much faster, and hence the negative assertion fails?

@twiecki
Copy link
Member Author

twiecki commented Nov 17, 2016

As I said, it's not using the initialization as far as I can tell.

@fonnesbeck
Copy link
Member

If I add start={'switchpoint':90} to the sample call (i.e. initialize the variable to a poor value), the test passes.

@springcoil
Copy link
Contributor

Is this too early to merge?

@twiecki twiecki removed the WIP label Nov 18, 2016
@twiecki twiecki changed the title WIP Sample init Sample init Nov 18, 2016
@twiecki
Copy link
Member Author

twiecki commented Nov 18, 2016

No, lets wait a bit for others to also take another look, but then I think we can merge.

@AustinRochford
Copy link
Member

@twiecki I like the design you ended up with very much 👍

@fonnesbeck
Copy link
Member

I agree. This is great.

@twiecki
Copy link
Member Author

twiecki commented Nov 18, 2016

Thanks guys!

@jsalvatier
Copy link
Member

This is super rad. Thanks for doing this guys.


if init == 'advi':
v_params = pm.variational.advi(n=n_init)
start = pm.variational.sample_vp(v_params, 1)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki This new initialization is a very, very nice feature!

One issue I have observed is that the first few samples (for transformed variables) are very far away from the correct values. I think the problem is that currently pm.variational.sample_vp is returning non-transformed values. I guess start = pm.variational.sample_vp(v_params, 1, progressbar=False, hide_transformed=False)[0], should fix this problem. I also have a question, sampling from the variational posterior, instead of doing something like start = v_params.means, guarantees that when running parallel jobs n_jobs > 1 we get different starting points, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point re hide_transformed. Also, I think it will still use the same starting point because the parallelization happens down-stream of here. Need to return N samples where N is the number of chains / njobs. Want to help with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong about the starting points being the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it makes totally sense to need N samples. I will try to take a look at this ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants