-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[WIP] Multinomial sampling for nuts #1769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Very cool. Have you compared multinomial to slice in any way? The stan guys say it's more efficient and I trust them, just curious. |
pymc3/step_methods/hmc/nuts.py
Outdated
@@ -12,6 +12,7 @@ | |||
|
|||
|
|||
def bern(p): | |||
assert np.isfinite(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you checking for here? generally we should avoid assertions in favor of explicitly raising an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That slipped in from debugging... But it might make sense to check for nan. infs should be fine.
pymc3/step_methods/hmc/nuts.py
Outdated
@@ -229,8 +226,8 @@ def _build_subtree(self, left, depth, epsilon): | |||
self.max_energy_change = energy_change | |||
p_accept = min(1, np.exp(-energy_change)) | |||
|
|||
size = int(self.log_u + energy_change <= 0) | |||
diverging = not (self.log_u + energy_change < self.Emax) | |||
size = - energy_change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space between operator and energy_change
pymc3/step_methods/hmc/nuts.py
Outdated
self.step_size = step_size | ||
self.Emax = Emax | ||
self.start_energy = np.array(start.energy) | ||
|
||
self.left = self.right = start | ||
self.proposal = Proposal(start.q, start.energy, 1.0) | ||
self.depth = 0 | ||
self.size = 1 | ||
self.size = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to log_size
or similar?
pymc3/step_methods/hmc/nuts.py
Outdated
@@ -242,15 +239,15 @@ def _build_subtree(self, left, depth, epsilon): | |||
|
|||
tree2, diverging, turning = self._build_subtree(tree1.right, depth - 1, epsilon) | |||
|
|||
size = tree1.size + tree2.size | |||
size = np.logaddexp(tree1.size, tree2.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
pymc3/step_methods/hmc/nuts.py
Outdated
accept_sum = tree1.accept_sum + tree2.accept_sum | ||
n_proposals = tree1.n_proposals + tree2.n_proposals | ||
|
||
left, right = tree1.left, tree2.right | ||
span = np.sign(epsilon) * (right.q - left.q) | ||
turning = turning or (span.dot(left.p) < 0) or (span.dot(right.p) < 0) | ||
|
||
if bern(tree2.size * 1. / max(size, 1)): | ||
if bern(np.exp(tree2.size - size)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing needed here/yet, but it looks like we might just refactor bern
to work in log space, since that's how we're always using it now.
03825de
to
a107074
Compare
@ColCarrol sounds all fine and should be fixed now. |
|
Fixing this depends on #1770.
…On 10.02.2017 21:05, Thomas Wiecki wrote:
|======================================================================
FAIL: Check effective sample size is equal to number of samples when
initializing with MAP
----------------------------------------------------------------------
Traceback (most recent call last): File
"/home/travis/build/pymc-devs/pymc3/pymc3/tests/test_diagnostics.py",
line 160, in test_effective_n assert_allclose(n_effective, n_jobs *
n_samples, 2) File
"/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py",
line 1392, in assert_allclose verbose=verbose, header=header) File
"/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py",
line 739, in assert_array_compare raise AssertionError(msg)
AssertionError: Not equal to tolerance rtol=2, atol=0 (mismatch 20.0%)
x: array([ 300., -1577., 300., 300., 300.]) y: array(300)
-------------------- >> begin captured stdout << ---------------------
Optimization terminated successfully. Current function value: 4.594693
Iterations: 0 Function evaluations: 1 Gradient evaluations: 1
--------------------- >> end captured stdout << ---------------------- |
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1769 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABy5HbFrUZSwA3jEg3X7oOwO0z1ZuK1Qks5rbMMagaJpZM4L9RX1>.
|
@aseyboldt Fix is merged, need to rebase. |
5eea3c3
to
b90e3dc
Compare
The previous version did not handle cases where the energy is -inf correctly. Initially I thought the "diverging" check would account for this, but even if we are diverging, we still might check bern(tree2.log_size - log_size). But this is non if log_size is inf. |
b42666a
to
e1623a4
Compare
@aseyboldt Anything other than rebase you think is missing here? |
Yes, two things:
|
FWIW It seems sensical to me to abort if the hamiltonian increases by a lot too, but when in doubt we should copy Stan. |
e1623a4
to
7b503c5
Compare
7b503c5
to
209d35f
Compare
This is ready for review again. I also included the termination criterion, that takes into account the metric (see [1]). As far as I can see, this should now do pretty much the same as stan. [1] Betancourt, Michael. “Identifying the Optimal Integration Time in Hamiltonian Monte Carlo.” arXiv:1601.00225 [Stat], January 2, 2016. http://arxiv.org/abs/1601.00225. |
pymc3/step_methods/hmc/trajectory.py
Outdated
f.trust_input = True | ||
return f | ||
|
||
|
||
INTEGRATORS_SINGLE = { | ||
'leapfrog': _theano_single_leapfrog, | ||
'two-stage': _theano_single_twostage, | ||
'three-stage': _theano_single_threestage | ||
'three-stage': _theano_single_threestage, | ||
'leapfrog3': _theano_single_leapfrog3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this new integrator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that one slipped in by accident. It does three leapfrog steps in a row. I'm using that to compare three-stage and leapfrog (more on that later). I'll take it out of this pull request.
pymc3/tests/test_step.py
Outdated
@@ -132,6 +137,7 @@ def check_trace(self, step_method): | |||
trace = sample(n_steps, step=step_method(), random_seed=1) | |||
|
|||
if not benchmarking: | |||
print(trace.get_values('x')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray debug statement.
@ColCarroll can you also take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks really good! will try to look up the answers to my own questions tonight...
pymc3/step_methods/hmc/nuts.py
Outdated
@@ -178,17 +182,17 @@ def competence(var): | |||
|
|||
|
|||
# A node in the NUTS tree that is at the far right or left of the tree | |||
Edge = namedtuple("Edge", 'q, p, q_grad, energy') | |||
Edge = namedtuple("Edge", 'q, p, v, q_grad, energy') | |||
|
|||
# A proposal for the next position | |||
Proposal = namedtuple("Proposal", "q, energy, p_accept") | |||
|
|||
# A subtree of the binary tree build by nuts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 'built'
pymc3/step_methods/hmc/nuts.py
Outdated
|
||
size = tree1.size + tree2.size | ||
accept_sum = tree1.accept_sum + tree2.accept_sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move accept_sum
and n_proposal
closer to where they're used
pymc3/step_methods/hmc/nuts.py
Outdated
return tree, diverging, False | ||
|
||
tree1, diverging, turning = self._build_subtree(left, depth - 1, epsilon) | ||
if diverging or turning: | ||
return tree1, diverging, turning | ||
|
||
tree2, diverging, turning = self._build_subtree(tree1.right, depth - 1, epsilon) | ||
ok = not (diverging or turning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super descriptive, but might make reading the code easier to just move it to the if
statement (if not (diverging or turning):
)
@@ -123,7 +124,7 @@ def __init__(self, A): | |||
self.L = scipy.linalg.cholesky(A, lower=True) | |||
|
|||
def velocity(self, x): | |||
return x.T.dot(self.A.T) | |||
return tt.dot(self.A, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -57,6 +57,13 @@ def _theano_energy_function(H, q, **theano_kwargs): | |||
return energy_function, p | |||
|
|||
|
|||
def _theano_velocity_function(H, p, **theano_kwargs): | |||
v = H.pot.velocity(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like H.pot.velocity
is already a theano function, but it also looks like you're using the velocity calculation separately -- is it necessary to have it attached to the named tuple H
as well as calculated separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. We need the velocity at the start of the trajectory. Later values are returned by leapfrog. This is pretty much the same as with the gradient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am getting at is that wherever you call self.compute_velocity
, you can call H.pot.velocity
. I dug some more and H.pot.velocity
calls to one of the functions in quadpotential.py
, and is multiplying the momentum by a scaling constant or matrix. This should be pretty performant, but I haven't tested whether compiling to theano makes it faster. Just considering surface area of code, I'd rather not add another function though.
Separately! We pass the gradient through since that's a fairly expensive operation. Shouldn't velocity be resampled at the start of each leapfrog step, so returning it doesn't help things? I flagged another comment on the one place where it might be useful, but I don't know why it is useful there!
pymc3/step_methods/hmc/trajectory.py
Outdated
return f | ||
|
||
|
||
def _theano_single_leapfrog3(H, q, p, q_grad, **theano_kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_theano_triple_leapfrog
? I don't love integers in function names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was in there by mistake. I just used this to compare leapfrog with three-stage. I took it out.
@ColCarroll done |
95b8c56
to
481ec40
Compare
481ec40
to
de8ddf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to belabor this -- the only issue left I think is more about my lack of understanding than code issues.
span = right.q - left.q | ||
turning = turning or (span.dot(left.p) < 0) or (span.dot(right.p) < 0) | ||
p_sum = self.p_sum | ||
turning = (p_sum.dot(left.v) <= 0) or (p_sum.dot(right.v) <= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this equivalent? it looks like this is different from in the paper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different. The old implementation is actually somewhat wrong: It does not take into account the metric, but assumes a standard euclidean space. This was an assumption in the original paper, but in general we need to use the scalar product defined by the inverse mass matrix. (see the old stan version) Stan switched to a new version recently (I linked a paper with a description of that somewhere in the comments – Ctrl-F Betancourt). As far as I understand that was mostly to make riemannian hmc easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and thank for double checking :-)
@@ -57,6 +57,13 @@ def _theano_energy_function(H, q, **theano_kwargs): | |||
return energy_function, p | |||
|
|||
|
|||
def _theano_velocity_function(H, p, **theano_kwargs): | |||
v = H.pot.velocity(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am getting at is that wherever you call self.compute_velocity
, you can call H.pot.velocity
. I dug some more and H.pot.velocity
calls to one of the functions in quadpotential.py
, and is multiplying the momentum by a scaling constant or matrix. This should be pretty performant, but I haven't tested whether compiling to theano makes it faster. Just considering surface area of code, I'd rather not add another function though.
Separately! We pass the gradient through since that's a fairly expensive operation. Shouldn't velocity be resampled at the start of each leapfrog step, so returning it doesn't help things? I flagged another comment on the one place where it might be useful, but I don't know why it is useful there!
beautiful, thank you. i'll merge when tests pass again -- |
About the velocity: I thought it would be better to put this in the theano graph of the leapfrog, because I'm hoping that some models might be faster on the gpu, and especially a matrix vector product would be much faster there. For the initial velocity that is not a option, so I made an extra function for that. We could just call it directly without going through theano, but then the quadpotential would need to work as theano graph and as normal function. That's not impossible, but I thought that would be confusing. |
Yes -- I think architecturally the "right" answer is to get a
function/decorator in `theanof.py` that will turn the quadpotential classes
into theano functions, but this would require a careful diff of its own.
(keeping the theano magic in one file keeps the dream of swapping out
backends alive :) )
…On Fri, Feb 24, 2017 at 10:28 AM aseyboldt ***@***.***> wrote:
About the velocity: I thought it would be better to put this in the theano
graph of the leapfrog, because I'm hoping that some models might be faster
on the gpu, and especially a matrix vector product would be much faster
there. For the initial velocity that is not a option, so I made an extra
function for that. We could just call it directly without going through
theano, but then the quadpotential would need to work as theano graph and
as normal function. That's not impossible, but I thought that would be
confusing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1769 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACMHEKNSQZYz7h3dSPOjK99qxBIXUm9Oks5rfvcXgaJpZM4L9RX1>
.
|
thanks again! |
The change is surprisingly small (just the last commit, the others are from 1757), but it is easy to make some subtle mistake here. This could use some careful review ;-)