-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve NUTS initialization #1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree Thomas, I've run into this myself, in some sense it's bad for the user experience On Tue, Nov 8, 2016 at 3:35 PM, Thomas Wiecki [email protected]
Peadar Coyle |
What about adding |
I'm not convinced that initialization is the issue. The MAP estimate should be a fine place to start an MCMC run, yet this almost never works. I think we need a top-to bottom review of the NUTS sampler, as a start. For example, I notice that the scaling speed of adaptation parameter ( |
I agree, most likely there's a bug in estimating the scaling using the hessian. |
Bad adaptation is almost never our only problem. When initialization and adaptation are slow, mixing usually is, too. The adaptation algorithm's in the code and described in the manual. Often we wind up having to reparameterize models to reduce extreme ill-conditioning of the Hessian matrix. I think there's still a lot of room for improvement. The goal is to get to the typical set as quickly as possible in order to spend time there adapting the mass matrix and step size. We were hoping black box mean-field variational inference would get us there quickly (get to posterior mean and take approximate draw [don't use posterior mean, which is often not in the typical set itself]), but it's not nearly robust enough yet for that purpose. Stan's HMC is intentionally conservative in its adaptation because if adaptation fails, so will sampling. You may also want to check out the new version of NUTS that Michael put into Stan recently---it replaces the slice sampler with a discrete draw (multinomial with count 1) as described here: |
Is this resolved now @twiecki or is there still work to be done? |
@bob-carpenter Thanks for your help. I found using ADVIs diagonal covariance estimate as the mass matrix to work quite well in practice. Which regularizers do you use for the covariance estimation? I read the paper on XHMC but thought it was more of a new version, rather than a replacement for NUTS. Or are there are insights (like the multinomial sample) that made it into the NUTS implementation? @springcoil Lets leave it open for now. |
Our current NUTS (Stan 2.12) uses the multinomial sampling The diagonal is regularized toward the unit diagonal and |
Is there any more detail on the Multinomial sampling? Also, in your own view, will XHMC become a drop-in replacement for NUTS in Stan? From the paper it seems like the theoretical improvements do not always carry through empirically. |
It's a drop-in replacement at the API level, but we haven't replaced it because we haven't tested it. The basic idea of the multinomial sampler is easy, but it has the same subtlety as basic NUTS in wanting to bias the selection toward the second half the simulation. The details are in Michael Betancourt's paper and in the Stan code itself. |
I'm writing a panel UserModel and found that nuts is incredibly slow. Is that because I have very high dimensionality? I used new advi initialization. |
@ferrine Can you upload the model with example data? |
Modelfrom pymc3.models.linear import Glm # from PR #1525
import pandas as pd
class FEPanel(Glm):
def __init__(self, x, y, index, dummy_na=True, intercept=False, labels=None,
priors=None, vars=None, family='normal', name='', model=None):
if not isinstance(x, pd.DataFrame):
raise TypeError('Need Pandas DataFrame for x')
if not isinstance(y, pd.Series):
raise TypeError('Need Pandas Series for y')
if not isinstance(index, (tuple, list)):
index = [index]
x = pd.get_dummies(
x, columns=index, drop_first=not intercept, dummy_na=dummy_na
) # type: pd.DataFrame
is_dummy = lambda s: any(s.startswith('%s_' % l) for l in index)
self.dummies = list(filter(is_dummy, x.columns))
self.not_dummies = list(set(x.columns) - set(self.dummies))
new_priors = dict.fromkeys(
self.dummies, self.default_intercept_prior
)
if priors is None:
priors = dict()
new_priors.update(priors)
super(FEPanel, self).__init__(
x, y, intercept=intercept, labels=labels,
priors=new_priors, vars=vars, family=family, name=name, model=model
)
@classmethod
def from_formula(cls, *args, **kwargs):
raise NotImplementedError('Sorry')
@property
def dummies_vars(self):
return [self[v] for v in self.dummies]
@property
def not_dummies_vars(self):
return [self[v] for v in self.not_dummies] Datadata = pd.read_csv('tests/data/testdata.csv') Usagewith pm.Model() as model:
g = FEPanel(data.iloc[:,1:], data.iloc[:, 0], index=['Country'])
trace = sample(1000, n_init=12000) # extremely slow when NUTS starts sampling |
|
It was in my repo, here is raw |
Same issue without nans (2.73 it/s). Maybe that's just bad test set |
I tried dropping nans too but the RV above was still created which made me think it was a model creation issue. |
that's because |
If I set that to |
15 was top on my laptop:( |
But it converges fine otherwise? |
ye |
Closing as the ADVI initialization seems to have fixed most of the issues. |
This problem went away after I did
|
I have seen many complaints about NUTS being slow. In 100% of these cases the root cause was bad initialization / scaling of the NUTS sampler. Using ADVI to estimate a diagonal covariance matrix for scaling NUTS is a robust solution.
However, I wonder if there isn't something better we can do. Specifically, I bet the @stan-dev guys use some clever tricks that they might be able to help us with. CC @betanalpha @bob-carpenter
The text was updated successfully, but these errors were encountered: