Sample init #1523

twiecki · 2016-11-12T09:41:33Z

This adds a sample_init() function which simplifies sampling and proper initialization. The main motivation comes from many users claiming nuts to be slow or to be stuck. ADVI helps with this in all cases I encountered. This combination of ADVI init + NUTS is the default here. Other use-cases include sampling directly from ADVI if ADVI is selected as the sampler.

I also changed some examples to demonstrate how it can be used.

Feedback on API and naming would be especially helpful.

Addresses #1512.

springcoil · 2016-11-12T14:01:53Z

I think the API is fine for the moment, it's a natural way to sample_init. It's a natural API on top of Metropolis and ADVI, easier than the old way.

@fonnesbeck @ColCarroll @AustinRochford could one of you three review this as well in case I missed something.

I think the motivation is clear, and it'll be clear from the documentation that this is a good thing to use, to get over some of the slowness problems we got pinged at us on Twitter :)

ericmjl · 2016-11-12T21:08:10Z

I can't wait for this API to be merged in!

ColCarroll

Looks great! Have only a few nitpicks that can be changed before or after a merge

ColCarroll · 2016-11-12T21:50:32Z

pymc3/sampling.py

+    to estimate a diagonal covariance matrix and using this as the scaling matrix
+    produces robust results over a wide class of continuous models.
+
+    Parameteres


ColCarroll · 2016-11-12T21:54:30Z

pymc3/sampling.py

+        if init != 'advi':
+            raise ValueError("To sample via ADVI, you have to set init='advi'.")
+        trace = pm.variational.sample_vp(v_params, draws=draws)
+    else:


should sampler instead mimic the step argument in sample, in that it accepts a function that can sample? That would elminate some messy string parsing.

ColCarroll · 2016-11-12T21:55:29Z

pymc3/sampling.py

+        Sampler to use. Will be initialized using init algorithm.
+        * nuts : Run NUTS sampler with the init covariance estimation as the scaling matrix.
+        * hmc : Run HamiltonianMC sampler with the init covariance estimation as the scaling matrix.
+        * advi : Sample from variational posterior, requires init='advi'.


should the model kwarg be documented?

ColCarroll · 2016-11-12T21:56:20Z

pymc3/sampling.py

+            raise ValueError("To sample via ADVI, you have to set init='advi'.")
+        trace = pm.variational.sample_vp(v_params, draws=draws)
+    else:
+        trace = pm.sample(step=step, start=start, draws=draws, **kwargs)


shouldn't this call have the model kwarg passed in?

ferrine · 2016-11-13T08:47:35Z

Is it reasonable to initkwargs and samplekwargs for better customising. And is it possible to support advi_minibatch here?

springcoil · 2016-11-13T11:57:43Z

@ColCarroll What do you think about the initkwargs and samplekwargs. I think we can push that back to another PR.

springcoil · 2016-11-13T11:58:00Z

@twiecki Looks like if the nitpicks are resolved this is ready to go!

ColCarroll · 2016-11-13T16:32:12Z

@ferrine @springcoil personally, I think kwargs are hard because they require very good documentation and testing to maintain, and often lead to cryptic errors or (worse) silent errors. That being said, this is a general interface, so maybe they're necessary.

AustinRochford · 2016-11-13T21:04:29Z

Looks great!

fonnesbeck · 2016-11-13T23:01:48Z

Why not just extend sample to have an init='advi' argument? Seems simpler than having an entirely different function just for an initialization option. I'd go further and also silence the ADVI output when used as an initialization, unless explicitly asked for. Its the sampling we care about; the initialization should not be the focus.

So, I'm thinking about something like this:

with model:
    trace = pm.sample(2000, init='advi')

Initializing using advi...
Sampling using NUTS...
100%|██████████| 2000/2000 [00:10<00:00, 182.29it/s]

AustinRochford · 2016-11-14T00:04:41Z

@fonnesbeck I like that idea overall. One difficulty I see is that this initialization approach limits the acceptable step methods, so I'm a bit torn.

twiecki · 2016-11-14T09:31:55Z

Thanks for the feedback.

@fonnesbeck In your proposal, what would happen if the user specified step-methods incompatible with ADVI initialization? We would also lose the feature of sampling via ADVI. Essentially, this adds a new top-level API and degrades sample to be more low-level. I feel like that makes sense but it's also true that it might not be clear to a user if there are multiple ways to do the same thing.

springcoil · 2016-11-14T10:31:01Z

I agree with Thomas. That certainly fits my mental model of the API/ and how to initialize sampling.

twiecki · 2016-11-14T13:49:29Z

On second thought, sample_init() can really only be used on continuous models (as the doc-string says). But it's probably also confusing to use high-level sample_init() for some models, but sample() for others. sample() is already doing some magic by auto-assigning step-methods and we can add more magic by auto-initializing (if it's continuous, use ADVI etc).

fonnesbeck · 2016-11-14T16:31:44Z

@twiecki that's what I was thinking as well.

If the model is incompatible with ADVI (notably, discrete variables in the model), we can simply halt the model and return a message with that information.

… from trace.

@fonnesbeck

…e_init() as per @fonnesbeck suggestion.

twiecki · 2016-11-16T09:40:00Z

OK, I have refactored this PR taking all feedback into account. Specifically, @fonnesbeck suggestion I think is absolutely correct, the code isn't as bad as I thought and it makes the API really simple and requires no cognitive overhead. Please take another look.

Also, do not merge as the examples and tests still need to be updated in case we're happy with this.

twiecki · 2016-11-16T10:55:51Z

On third thought, maybe we should push init even further down into the NUTS class itself. There, we're already doing estimation of the mass matrix using the hessian so it would fit right in.

twiecki · 2016-11-17T09:39:57Z

@ferrine You're right, I hadn't considered that. It thus seems that this implementation is at the right level.

Should I go ahead and finish the PR up or do people have suggestions at the API level?

springcoil · 2016-11-17T10:31:59Z

@twiecki I think there's been enough discussion and that you should wrap up this PR. I think it's been interesting to see the different level of abstractions proposed and I think your implementation is about right.

springcoil · 2016-11-17T10:32:37Z

pymc3/sampling.py

+        * nuts : Run NUTS and estimate posterior mean and covariance matrix.
+    n_init : int
+        Number of iterations of initializer
+        If 'advi', number of iterations, if 'nuts', number of draws.


This is a very clear docstring - good work :)

What about initializing using the defaults for each parameter (i.e. median, mode, etc.)? I would even argue that it should be the default, as it is the most general across models.

@fonnesbeck Well that's what we do currently, no? And it's not working well.

So, if advi is the default, and there are discrete variables, what happens? We don't want the default to fail for entire classes of model.

init only matters for continuous models, if a model has discrete RVs the behavior doesn't change to before: #1523 (diff)

springcoil · 2016-11-17T10:34:19Z

pymc3/sampling.py

+
+    step = pm.NUTS(scaling=cov, is_cov=True)
+
+    return start, step


This convenience function looks good to me.

Why not call this (or similar convenience function from within nuts? That way you can change whatever other params you want.

Plus, then we can make similar calls within other samplers.

its better to have intelligent defaults than have specialized initialization functions.

@jsalvatier I have considered it and I agree it would be better. The main reason is that ADVI not only provides us with the scaling but also the starting point (i.e. a sample from the posterior). Not sure how to do that if it lives inside NUTS.

I'm not sure exactly how to do it, but I'm sure its possible to do in a semi-nice way.

(if you're interested in copying my skill at making elegant things, I think its exactly my habit of seeing that doing it this way would be nicer/less opaque and then going and spending a bunch of time thinking about it that generated that skill. I think its probably worth it.).

If you're interested in gaining that skill, I can let you think about it for a while. (I can think about it instead if you want, though. don't mean to strong-arm you).

Why don't we both think about it and see what we come up with.

springcoil · 2016-11-17T10:35:06Z

pymc3/variational/advi.py

@@ -110,7 +110,8 @@ def advi(vars=None, start=None, model=None, n=5000, accurate_elbo=False,
        vars = model.vars
    vars = pm.inputvars(vars)

-    check_discrete_rvs(vars)
+    if not pm.model.all_continuous(vars):
+        raise ValueError('Model should not include discrete RVs for ADVI.')


This looks good to me, as a good test and helps the user use ADVI correctly.

I think little warnings like this make things like ADVI more 'discoverable'. I.e. without reading the documentation you will realize that this 'fancy method' just doesn't work with discrete RVs. So you should use another method.

ferrine · 2016-11-17T11:57:04Z

pymc3/sampling.py

@@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None
    """
    model = modelcontext(model)

-    step = assign_step_methods(model, step)
+    if step is None and init is not None and pm.model.all_continuous(model.vars):


Do we really need to check pm.model.all_continuous(model.vars)?
Suppose init=='map' that does not require pm.model.all_continuous(model.vars) to be true then it will be rejected according to this if statement if discrete vars exist. There should be something like switching default: primary is advi, secondary is map

Good point. This really only handles the case of a continuous model where we can use NUTS. We might want to extend it for discrete models too. Not sure it should be part of this PR though.

springcoil · 2016-11-17T12:33:19Z

I don't think it should be part of this PR.
Maybe open a new one for the future.

On Thu, Nov 17, 2016 at 12:23 PM, Thomas Wiecki [email protected]
wrote:

@twiecki commented on this pull request.

In pymc3/sampling.py #1523:

@@ -132,7 +142,14 @@ def sample(draws, step=None, start=None, trace=None, chain=0, njobs=1, tune=None
"""
model = modelcontext(model)

step = assign_step_methods(model, step)

if step is None and init is not None and pm.model.all_continuous(model.vars):

Good point. This really only handles the case of a continuous model where
we can use NUTS. We might want to extend it for discrete models too. Not
sure it should be part of this PR though.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1523, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA8DiCR8HyiwALAV1S1rcWfl4Mkyx_84ks5q_EdCgaJpZM4KwYD7
.

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

twiecki · 2016-11-17T15:53:26Z

Huh, no idea why

======================================================================
FAIL: Confirm Gelman-Rubin statistic is far from 1 for a small number of samples.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wiecki/working/projects/pymc/pymc3/tests/test_diagnostics.py", line 40, in test_bad
    self.good_ratio for r in rhat.values()))
AssertionError: True is not false

is falling.

ColCarroll · 2016-11-17T16:05:38Z

I haven't looked closely, but is it possible the sample initialization is running for this test? It relies on sampling not running long enough to converge, but IIRC when I was tuning it, the distribution it samples from converges quite quickly (which is why there are only 10 samples). I can take a closer look tonight.

twiecki · 2016-11-17T16:07:02Z

That's what I thought initially but it doesn't seem to run the initialization (nor should it).

fonnesbeck · 2016-11-17T16:19:49Z

Could it be that a better initialization has made the example converge much faster, and hence the negative assertion fails?

twiecki · 2016-11-17T16:20:43Z

As I said, it's not using the initialization as far as I can tell.

fonnesbeck · 2016-11-17T16:24:02Z

If I add start={'switchpoint':90} to the sample call (i.e. initialize the variable to a poor value), the test passes.

springcoil · 2016-11-18T10:22:07Z

Is this too early to merge?

twiecki · 2016-11-18T10:31:09Z

No, lets wait a bit for others to also take another look, but then I think we can merge.

AustinRochford · 2016-11-18T13:51:12Z

@twiecki I like the design you ended up with very much 👍

fonnesbeck · 2016-11-18T13:57:52Z

I agree. This is great.

twiecki · 2016-11-18T13:58:44Z

Thanks guys!

jsalvatier · 2016-12-09T14:05:03Z

This is super rad. Thanks for doing this guys.

aloctavodia · 2016-12-09T16:06:05Z

pymc3/sampling.py

+
+    if init == 'advi':
+        v_params = pm.variational.advi(n=n_init)
+        start = pm.variational.sample_vp(v_params, 1)[0]


@twiecki This new initialization is a very, very nice feature!

One issue I have observed is that the first few samples (for transformed variables) are very far away from the correct values. I think the problem is that currently pm.variational.sample_vp is returning non-transformed values. I guess start = pm.variational.sample_vp(v_params, 1, progressbar=False, hide_transformed=False)[0], should fix this problem. I also have a question, sampling from the variational posterior, instead of doing something like start = v_params.means, guarantees that when running parallel jobs n_jobs > 1 we get different starting points, right?

Ah, good point re hide_transformed. Also, I think it will still use the same starting point because the parallelization happens down-stream of here. Need to return N samples where N is the number of chains / njobs. Want to help with that?

I could be wrong about the starting points being the same.

I see, it makes totally sense to need N samples. I will try to take a look at this ASAP.

This was referenced Nov 12, 2016

Submit BEST #1517

Closed

NUTS is slow #1126

Closed

springcoil added the enhancements label Nov 12, 2016

ericmjl mentioned this pull request Nov 12, 2016

Base Model class for pymc3.models #1524

Closed

ColCarroll reviewed Nov 12, 2016

View reviewed changes

twiecki mentioned this pull request Nov 14, 2016

3.0 Feature Freeze #1526

Closed

twiecki added 6 commits November 16, 2016 08:17

ENH Infer variable names directly from model in covariance estimation…

1770be2

… from trace.

ENH Add sample_init() function and update examples.

77b67d0

DOC Convert more NBs to use sample_init().

dc203d9

MAINT Remove Metropolis init as it does not seem to work well.

85b46f6

DOC Add better docs and add advi_map.

e3c8b63

WIP Perform initializing inside of sample instead of adding new sampl…

4d53a36

…e_init() as per @fonnesbeck suggestion.

twiecki force-pushed the sample_init branch from 055f666 to 4d53a36 Compare November 16, 2016 09:38

twiecki changed the title ~~Sample init~~ WIP Sample init Nov 16, 2016

REF Make all_continuous() a general purpose function.

85e0f9d

twiecki added the WIP label Nov 17, 2016

springcoil reviewed Nov 17, 2016

View reviewed changes

ferrine reviewed Nov 17, 2016

View reviewed changes

MAINT Convert some examples to new sample() api. Adapt tests.

34e3f45

TST Fix Rhat bad test by changing seed and starting point.

59f500c

twiecki removed the WIP label Nov 18, 2016

twiecki changed the title ~~WIP Sample init~~ Sample init Nov 18, 2016

twiecki merged commit dbf0654 into master Nov 18, 2016

twiecki deleted the sample_init branch November 18, 2016 13:59

aloctavodia reviewed Dec 9, 2016

View reviewed changes

Uh oh!

Sample init #1523

Sample init #1523

Uh oh!

Conversation

twiecki commented Nov 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springcoil commented Nov 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericmjl commented Nov 12, 2016

Uh oh!

ColCarroll left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ferrine commented Nov 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springcoil commented Nov 13, 2016

Uh oh!

springcoil commented Nov 13, 2016

Uh oh!

ColCarroll commented Nov 13, 2016

Uh oh!

AustinRochford commented Nov 13, 2016

Uh oh!

fonnesbeck commented Nov 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AustinRochford commented Nov 14, 2016

Uh oh!

twiecki commented Nov 14, 2016

Uh oh!

springcoil commented Nov 14, 2016

Uh oh!

twiecki commented Nov 14, 2016

Uh oh!

fonnesbeck commented Nov 14, 2016

Uh oh!

twiecki commented Nov 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Nov 16, 2016

Uh oh!

twiecki commented Nov 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springcoil commented Nov 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsalvatier Dec 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki commented Nov 12, 2016 •

edited

Loading

springcoil commented Nov 12, 2016 •

edited

Loading

ferrine commented Nov 13, 2016 •

edited

Loading

fonnesbeck commented Nov 13, 2016 •

edited

Loading

twiecki commented Nov 16, 2016 •

edited

Loading

twiecki commented Nov 17, 2016 •

edited

Loading

jsalvatier Dec 9, 2016 •

edited

Loading

ferrine Nov 17, 2016 •

edited

Loading