Improve NUTS initialization #1512

twiecki · 2016-11-08T15:35:14Z

I have seen many complaints about NUTS being slow. In 100% of these cases the root cause was bad initialization / scaling of the NUTS sampler. Using ADVI to estimate a diagonal covariance matrix for scaling NUTS is a robust solution.

However, I wonder if there isn't something better we can do. Specifically, I bet the @stan-dev guys use some clever tricks that they might be able to help us with. CC @betanalpha @bob-carpenter

springcoil · 2016-11-08T15:45:06Z

I agree Thomas,
That was my suspicion about what the complaints were.

I've run into this myself, in some sense it's bad for the user experience
though - we shouldn't impose on the user a leaky abstraction like having to
understand the intricacies of initialization/ scaling.

On Tue, Nov 8, 2016 at 3:35 PM, Thomas Wiecki [email protected]
wrote:

I have seen many complaints about NUTS being slow. In 100% of these cases
the root cause was bad initialization / scaling of the NUTS sampler. Using
ADVI to estimate a diagonal covariance matrix for scaling NUTS is a robust
solution.

However, I wonder if there isn't something better we can do. Specifically,
I bet the @stan-dev https://github.com/stan-dev guys use some clever
tricks that they might be able to help us with. CC @betanalpha
https://github.com/betanalpha @bob-carpenter
https://github.com/bob-carpenter

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1512, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA8DiJsalLo7yqs8pduO4-rdCyqom16wks5q8JazgaJpZM4Ksj9t
.

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

ferrine · 2016-11-08T22:02:01Z

What about adding NUTS.from_advi(..., tune_interval=None) constructor where ... is advi and nuts common arguments and tune_interval is optional int defining n samples for estimation covariance matrix instead of diagonal advi solution?

fonnesbeck · 2016-11-08T23:26:31Z

I'm not convinced that initialization is the issue. The MAP estimate should be a fine place to start an MCMC run, yet this almost never works. I think we need a top-to bottom review of the NUTS sampler, as a start. For example, I notice that the scaling speed of adaptation parameter (k) does not even get used by the algorithm. That's a little concerning.

twiecki · 2016-11-09T01:32:30Z

I agree, most likely there's a bug in estimating the scaling using the hessian.

bob-carpenter · 2016-11-09T15:55:35Z

Bad adaptation is almost never our only problem. When initialization and adaptation are slow, mixing usually is, too. The adaptation algorithm's in the code and described in the manual. Often we wind up having to reparameterize models to reduce extreme ill-conditioning of the Hessian matrix.

I think there's still a lot of room for improvement. The goal is to get to the typical set as quickly as possible in order to spend time there adapting the mass matrix and step size. We were hoping black box mean-field variational inference would get us there quickly (get to posterior mean and take approximate draw [don't use posterior mean, which is often not in the typical set itself]), but it's not nearly robust enough yet for that purpose. Stan's HMC is intentionally conservative in its adaptation because if adaptation fails, so will sampling.

You may also want to check out the new version of NUTS that Michael put into Stan recently---it replaces the slice sampler with a discrete draw (multinomial with count 1) as described here:

https://arxiv.org/abs/1601.00225

springcoil · 2016-11-12T14:02:49Z

Is this resolved now @twiecki or is there still work to be done?

twiecki · 2016-11-12T15:42:18Z

@bob-carpenter Thanks for your help. I found using ADVIs diagonal covariance estimate as the mass matrix to work quite well in practice. Which regularizers do you use for the covariance estimation?

I read the paper on XHMC but thought it was more of a new version, rather than a replacement for NUTS. Or are there are insights (like the multinomial sample) that made it into the NUTS implementation?

@springcoil Lets leave it open for now.

bob-carpenter · 2016-11-13T02:16:33Z

Our current NUTS (Stan 2.12) uses the multinomial sampling
instead of slice sampling. I'm pretty sure the exhaustive HMC
is implemented, but not exposed in the interfaces.
Riemannian HMC is also implemented but not exposed yet.

The diagonal is regularized toward the unit diagonal and
if there's a dense mass matrix, then that's regularized
toward the diagonal. I'd have to dive into the code to
see exactly how it's implemented now. The regularization is
obviously scale sensitive and I don't think we expose any way
to set the scales at the moment or provide inits. That's
coming soon after we refactor the basic back-end interface
calls.

twiecki · 2016-11-16T10:20:47Z

Is there any more detail on the Multinomial sampling?

Also, in your own view, will XHMC become a drop-in replacement for NUTS in Stan? From the paper it seems like the theoretical improvements do not always carry through empirically.

bob-carpenter · 2016-11-16T19:04:03Z

It's a drop-in replacement at the API level, but we haven't replaced it because we haven't tested it.

The basic idea of the multinomial sampler is easy, but it has the same subtlety as basic NUTS in wanting to bias the selection toward the second half the simulation. The details are in Michael Betancourt's paper and in the Stan code itself.

ferrine · 2016-11-19T20:28:32Z

I'm writing a panel UserModel and found that nuts is incredibly slow. Is that because I have very high dimensionality? I used new advi initialization.

twiecki · 2016-11-21T08:45:53Z

@ferrine Can you upload the model with example data?

ferrine · 2016-11-21T23:25:56Z

Model

from pymc3.models.linear import Glm    # from PR #1525
import pandas as pd


class FEPanel(Glm):
    def __init__(self, x, y, index, dummy_na=True, intercept=False, labels=None,
                 priors=None, vars=None, family='normal', name='', model=None):
        if not isinstance(x, pd.DataFrame):
            raise TypeError('Need Pandas DataFrame for x')
        if not isinstance(y, pd.Series):
            raise TypeError('Need Pandas Series for y')
        if not isinstance(index, (tuple, list)):
            index = [index]
        x = pd.get_dummies(
            x, columns=index, drop_first=not intercept, dummy_na=dummy_na
        )   # type: pd.DataFrame
        is_dummy = lambda s: any(s.startswith('%s_' % l) for l in index)
        self.dummies = list(filter(is_dummy, x.columns))
        self.not_dummies = list(set(x.columns) - set(self.dummies))
        new_priors = dict.fromkeys(
            self.dummies, self.default_intercept_prior
        )
        if priors is None:
            priors = dict()
        new_priors.update(priors)
        super(FEPanel, self).__init__(
            x, y, intercept=intercept, labels=labels,
            priors=new_priors, vars=vars, family=family, name=name, model=model
        )
    
    @classmethod
    def from_formula(cls, *args, **kwargs):
        raise NotImplementedError('Sorry')
    
    @property
    def dummies_vars(self):
        return [self[v] for v in self.dummies]

    @property
    def not_dummies_vars(self):
        return [self[v] for v in self.not_dummies]

Data

data = pd.read_csv('tests/data/testdata.csv')

Usage

with pm.Model() as model:
    g = FEPanel(data.iloc[:,1:], data.iloc[:, 0], index=['Country'])
    trace = sample(1000, n_init=12000) # extremely slow when NUTS starts sampling

twiecki · 2016-11-22T12:36:20Z

tests/data/testdata.csv does not seem to exist on that branch.

ferrine · 2016-11-23T09:38:36Z

It was in my repo, here is raw

twiecki · 2016-11-23T10:21:14Z

Not sure if this RV should be in the model, in any case, it's what seems to make NUTS stuck. Should it be in the model?

ferrine · 2016-11-23T10:44:50Z

Same issue without nans (2.73 it/s). Maybe that's just bad test set

twiecki · 2016-11-23T10:45:45Z

I tried dropping nans too but the RV above was still created which made me think it was a model creation issue.

ferrine · 2016-11-23T10:50:53Z

that's because dummy_na, is true

twiecki · 2016-11-23T11:16:38Z

If I set that to False, I get 75.00it/s and a clean trace.

ferrine · 2016-11-23T11:17:56Z

15 was top on my laptop:(
I have macbook air 13

twiecki · 2016-11-23T11:26:06Z

But it converges fine otherwise?

ferrine · 2016-11-23T11:26:16Z

ye

twiecki · 2016-11-28T04:54:18Z

Closing as the ADVI initialization seems to have fixed most of the issues.

citynorman · 2018-04-29T14:42:56Z

This problem went away after I did

import os
os.environ["MKL_THREADING_LAYER"] = "GNU"

twiecki mentioned this issue Nov 12, 2016

Sample init #1523

Merged

ferrine mentioned this issue Nov 23, 2016

User model #1525

Merged

twiecki closed this as completed Nov 28, 2016

ghost mentioned this issue May 29, 2017

WIP: replace Hessian scaling guessing by an identity matrix #2232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve NUTS initialization #1512

Improve NUTS initialization #1512

twiecki commented Nov 8, 2016

springcoil commented Nov 8, 2016

ferrine commented Nov 8, 2016

fonnesbeck commented Nov 8, 2016

twiecki commented Nov 9, 2016

bob-carpenter commented Nov 9, 2016

springcoil commented Nov 12, 2016

twiecki commented Nov 12, 2016 •

edited

Loading

bob-carpenter commented Nov 13, 2016

twiecki commented Nov 16, 2016

bob-carpenter commented Nov 16, 2016

ferrine commented Nov 19, 2016

twiecki commented Nov 21, 2016

ferrine commented Nov 21, 2016

twiecki commented Nov 22, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016 •

edited

Loading

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 28, 2016

citynorman commented Apr 29, 2018

Improve NUTS initialization #1512

Improve NUTS initialization #1512

Comments

twiecki commented Nov 8, 2016

springcoil commented Nov 8, 2016

ferrine commented Nov 8, 2016

fonnesbeck commented Nov 8, 2016

twiecki commented Nov 9, 2016

bob-carpenter commented Nov 9, 2016

springcoil commented Nov 12, 2016

twiecki commented Nov 12, 2016 • edited Loading

bob-carpenter commented Nov 13, 2016

twiecki commented Nov 16, 2016

bob-carpenter commented Nov 16, 2016

ferrine commented Nov 19, 2016

twiecki commented Nov 21, 2016

ferrine commented Nov 21, 2016

Model

Data

Usage

twiecki commented Nov 22, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016 • edited Loading

twiecki commented Nov 23, 2016

ferrine commented Nov 23, 2016

twiecki commented Nov 28, 2016

citynorman commented Apr 29, 2018

twiecki commented Nov 12, 2016 •

edited

Loading

ferrine commented Nov 23, 2016 •

edited

Loading