[WIP] Multinomial sampling for nuts #1769

aseyboldt · 2017-02-10T11:18:05Z

The change is surprisingly small (just the last commit, the others are from 1757), but it is easy to make some subtle mistake here. This could use some careful review ;-)

twiecki · 2017-02-10T11:29:50Z

Very cool. Have you compared multinomial to slice in any way? The stan guys say it's more efficient and I trust them, just curious.

ColCarroll · 2017-02-10T14:34:06Z

pymc3/step_methods/hmc/nuts.py

@@ -12,6 +12,7 @@


 def bern(p):
+    assert np.isfinite(p)


what are you checking for here? generally we should avoid assertions in favor of explicitly raising an exception.

That slipped in from debugging... But it might make sense to check for nan. infs should be fine.

ColCarroll · 2017-02-10T14:35:19Z

pymc3/step_methods/hmc/nuts.py

@@ -229,8 +226,8 @@ def _build_subtree(self, left, depth, epsilon):
                self.max_energy_change = energy_change
            p_accept = min(1, np.exp(-energy_change))

-            size = int(self.log_u + energy_change <= 0)
-            diverging = not (self.log_u + energy_change < self.Emax)
+            size = - energy_change


nit: space between operator and energy_change

ColCarroll · 2017-02-10T14:37:25Z

pymc3/step_methods/hmc/nuts.py

        self.step_size = step_size
        self.Emax = Emax
        self.start_energy = np.array(start.energy)

        self.left = self.right = start
        self.proposal = Proposal(start.q, start.energy, 1.0)
        self.depth = 0
-        self.size = 1
+        self.size = 0


rename to log_size or similar?

ColCarroll · 2017-02-10T14:38:09Z

pymc3/step_methods/hmc/nuts.py

@@ -242,15 +239,15 @@ def _build_subtree(self, left, depth, epsilon):

        tree2, diverging, turning = self._build_subtree(tree1.right, depth - 1, epsilon)

-        size = tree1.size + tree2.size
+        size = np.logaddexp(tree1.size, tree2.size)


ColCarroll · 2017-02-10T14:40:29Z

pymc3/step_methods/hmc/nuts.py

        accept_sum = tree1.accept_sum + tree2.accept_sum
        n_proposals = tree1.n_proposals + tree2.n_proposals

        left, right = tree1.left, tree2.right
        span = np.sign(epsilon) * (right.q - left.q)
        turning = turning or (span.dot(left.p) < 0) or (span.dot(right.p) < 0)

-        if bern(tree2.size * 1. / max(size, 1)):
+        if bern(np.exp(tree2.size - size)):


nothing needed here/yet, but it looks like we might just refactor bern to work in log space, since that's how we're always using it now.

aseyboldt · 2017-02-10T16:10:13Z

@ColCarrol sounds all fine and should be fixed now.

twiecki · 2017-02-10T20:05:43Z

======================================================================
FAIL: Check effective sample size is equal to number of samples when initializing with MAP
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pymc-devs/pymc3/pymc3/tests/test_diagnostics.py", line 160, in test_effective_n
    assert_allclose(n_effective, n_jobs * n_samples, 2)
  File "/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 1392, in assert_allclose
    verbose=verbose, header=header)
  File "/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 739, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=2, atol=0
(mismatch 20.0%)
 x: array([  300., -1577.,   300.,   300.,   300.])
 y: array(300)
-------------------- >> begin captured stdout << ---------------------
Optimization terminated successfully.
         Current function value: 4.594693
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1
--------------------- >> end captured stdout << ----------------------

aseyboldt · 2017-02-10T20:13:58Z

Fixing this depends on #1770.

…

On 10.02.2017 21:05, Thomas Wiecki wrote: |====================================================================== FAIL: Check effective sample size is equal to number of samples when initializing with MAP ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/build/pymc-devs/pymc3/pymc3/tests/test_diagnostics.py", line 160, in test_effective_n assert_allclose(n_effective, n_jobs * n_samples, 2) File "/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 1392, in assert_allclose verbose=verbose, header=header) File "/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/numpy/testing/utils.py", line 739, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=2, atol=0 (mismatch 20.0%) x: array([ 300., -1577., 300., 300., 300.]) y: array(300) -------------------- >> begin captured stdout << --------------------- Optimization terminated successfully. Current function value: 4.594693 Iterations: 0 Function evaluations: 1 Gradient evaluations: 1 --------------------- >> end captured stdout << ---------------------- | — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1769 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABy5HbFrUZSwA3jEg3X7oOwO0z1ZuK1Qks5rbMMagaJpZM4L9RX1>.

twiecki · 2017-02-11T11:30:35Z

@aseyboldt Fix is merged, need to rebase.

aseyboldt · 2017-02-11T23:16:39Z

The previous version did not handle cases where the energy is -inf correctly. Initially I thought the "diverging" check would account for this, but even if we are diverging, we still might check bern(tree2.log_size - log_size). But this is non if log_size is inf.

twiecki · 2017-02-16T14:19:59Z

@aseyboldt Anything other than rebase you think is missing here?

aseyboldt · 2017-02-16T16:41:31Z

Yes, two things:

The current version handles divergence a bit different than stan. Stan only aborts the trajectory if the hamiltonian decreases a lot (in this case the acceptance prob is pretty much 0 anyway), this code aborts if it decreases or increases a lot. Both should not happen if the step size is small enough. I guess we should do the same thing as stan here? But I still need to go through the code and make sure no strange things happen if something is inf or nan.
Trouble with the normal termination criterion. I think it should somehow use the mass-matrix, but I haven't figured it out yet. I guess this wouldn't need to go in this PR.

twiecki · 2017-02-20T08:46:54Z

FWIW It seems sensical to me to abort if the hamiltonian increases by a lot too, but when in doubt we should copy Stan.

aseyboldt · 2017-02-21T15:38:26Z

This is ready for review again. I also included the termination criterion, that takes into account the metric (see [1]). As far as I can see, this should now do pretty much the same as stan.

[1] Betancourt, Michael. “Identifying the Optimal Integration Time in Hamiltonian Monte Carlo.” arXiv:1601.00225 [Stat], January 2, 2016. http://arxiv.org/abs/1601.00225.

twiecki · 2017-02-21T16:10:52Z

pymc3/step_methods/hmc/trajectory.py

    f.trust_input = True
    return f


 INTEGRATORS_SINGLE = {
    'leapfrog': _theano_single_leapfrog,
    'two-stage': _theano_single_twostage,
-    'three-stage': _theano_single_threestage
+    'three-stage': _theano_single_threestage,
+    'leapfrog3': _theano_single_leapfrog3,


What is this new integrator?

oh, that one slipped in by accident. It does three leapfrog steps in a row. I'm using that to compare three-stage and leapfrog (more on that later). I'll take it out of this pull request.

twiecki · 2017-02-21T16:11:33Z

pymc3/tests/test_step.py

@@ -132,6 +137,7 @@ def check_trace(self, step_method):
            trace = sample(n_steps, step=step_method(), random_seed=1)

        if not benchmarking:
+            print(trace.get_values('x'))


stray debug statement.

twiecki · 2017-02-21T16:12:21Z

@ColCarroll can you also take a look at this?

ColCarroll

looks really good! will try to look up the answers to my own questions tonight...

ColCarroll · 2017-02-22T23:20:26Z

pymc3/step_methods/hmc/nuts.py

@@ -178,17 +182,17 @@ def competence(var):


 # A node in the NUTS tree that is at the far right or left of the tree
-Edge = namedtuple("Edge", 'q, p, q_grad, energy')
+Edge = namedtuple("Edge", 'q, p, v, q_grad, energy')

 # A proposal for the next position
 Proposal = namedtuple("Proposal", "q, energy, p_accept")

 # A subtree of the binary tree build by nuts.


typo: 'built'

ColCarroll · 2017-02-22T23:23:25Z

pymc3/step_methods/hmc/nuts.py


-        size = tree1.size + tree2.size
        accept_sum = tree1.accept_sum + tree2.accept_sum


move accept_sum and n_proposal closer to where they're used

ColCarroll · 2017-02-22T23:24:38Z

pymc3/step_methods/hmc/nuts.py

            return tree, diverging, False

        tree1, diverging, turning = self._build_subtree(left, depth - 1, epsilon)
        if diverging or turning:
            return tree1, diverging, turning

        tree2, diverging, turning = self._build_subtree(tree1.right, depth - 1, epsilon)
+        ok = not (diverging or turning)


this is super descriptive, but might make reading the code easier to just move it to the if statement (if not (diverging or turning):)

ColCarroll · 2017-02-22T23:25:15Z

pymc3/step_methods/hmc/quadpotential.py

@@ -123,7 +124,7 @@ def __init__(self, A):
        self.L = scipy.linalg.cholesky(A, lower=True)

    def velocity(self, x):
-        return x.T.dot(self.A.T)
+        return tt.dot(self.A, x)


ColCarroll · 2017-02-22T23:26:30Z

pymc3/step_methods/hmc/trajectory.py

@@ -57,6 +57,13 @@ def _theano_energy_function(H, q, **theano_kwargs):
    return energy_function, p


+def _theano_velocity_function(H, p, **theano_kwargs):
+    v = H.pot.velocity(p)


it looks like H.pot.velocity is already a theano function, but it also looks like you're using the velocity calculation separately -- is it necessary to have it attached to the named tuple H as well as calculated separately?

Not sure what you mean. We need the velocity at the start of the trajectory. Later values are returned by leapfrog. This is pretty much the same as with the gradient.

What I am getting at is that wherever you call self.compute_velocity, you can call H.pot.velocity. I dug some more and H.pot.velocity calls to one of the functions in quadpotential.py, and is multiplying the momentum by a scaling constant or matrix. This should be pretty performant, but I haven't tested whether compiling to theano makes it faster. Just considering surface area of code, I'd rather not add another function though.

Separately! We pass the gradient through since that's a fairly expensive operation. Shouldn't velocity be resampled at the start of each leapfrog step, so returning it doesn't help things? I flagged another comment on the one place where it might be useful, but I don't know why it is useful there!

ColCarroll · 2017-02-22T23:27:44Z

pymc3/step_methods/hmc/trajectory.py

+    return f
+
+
+def _theano_single_leapfrog3(H, q, p, q_grad, **theano_kwargs):


_theano_triple_leapfrog? I don't love integers in function names.

That was in there by mistake. I just used this to compare leapfrog with three-stage. I took it out.

aseyboldt · 2017-02-24T14:33:48Z

@ColCarroll done

ColCarroll

Sorry to belabor this -- the only issue left I think is more about my lack of understanding than code issues.

ColCarroll · 2017-02-24T14:58:21Z

pymc3/step_methods/hmc/nuts.py

-        span = right.q - left.q
-        turning = turning or (span.dot(left.p) < 0) or (span.dot(right.p) < 0)
+        p_sum = self.p_sum
+        turning = (p_sum.dot(left.v) <= 0) or (p_sum.dot(right.v) <= 0)


is this equivalent? it looks like this is different from in the paper

This is different. The old implementation is actually somewhat wrong: It does not take into account the metric, but assumes a standard euclidean space. This was an assumption in the original paper, but in general we need to use the scalar product defined by the inverse mass matrix. (see the old stan version) Stan switched to a new version recently (I linked a paper with a description of that somewhere in the comments – Ctrl-F Betancourt). As far as I understand that was mostly to make riemannian hmc easier.

Oh, and thank for double checking :-)

ColCarroll · 2017-02-24T15:09:05Z

pymc3/step_methods/hmc/trajectory.py

@@ -57,6 +57,13 @@ def _theano_energy_function(H, q, **theano_kwargs):
    return energy_function, p


+def _theano_velocity_function(H, p, **theano_kwargs):
+    v = H.pot.velocity(p)


What I am getting at is that wherever you call self.compute_velocity, you can call H.pot.velocity. I dug some more and H.pot.velocity calls to one of the functions in quadpotential.py, and is multiplying the momentum by a scaling constant or matrix. This should be pretty performant, but I haven't tested whether compiling to theano makes it faster. Just considering surface area of code, I'd rather not add another function though.

Separately! We pass the gradient through since that's a fairly expensive operation. Shouldn't velocity be resampled at the start of each leapfrog step, so returning it doesn't help things? I flagged another comment on the one place where it might be useful, but I don't know why it is useful there!

ColCarroll · 2017-02-24T15:26:29Z

beautiful, thank you. i'll merge when tests pass again -- quadpotential.py could get turned into all theano functions, and then we could refactor this to get rid of a separate compute_velocity function, but I'm not too worried about that.

aseyboldt · 2017-02-24T15:28:21Z

About the velocity: I thought it would be better to put this in the theano graph of the leapfrog, because I'm hoping that some models might be faster on the gpu, and especially a matrix vector product would be much faster there. For the initial velocity that is not a option, so I made an extra function for that. We could just call it directly without going through theano, but then the quadpotential would need to work as theano graph and as normal function. That's not impossible, but I thought that would be confusing.

ColCarroll · 2017-02-24T15:33:09Z

Yes -- I think architecturally the "right" answer is to get a function/decorator in `theanof.py` that will turn the quadpotential classes into theano functions, but this would require a careful diff of its own. (keeping the theano magic in one file keeps the dream of swapping out backends alive :) )

…

On Fri, Feb 24, 2017 at 10:28 AM aseyboldt ***@***.***> wrote: About the velocity: I thought it would be better to put this in the theano graph of the leapfrog, because I'm hoping that some models might be faster on the gpu, and especially a matrix vector product would be much faster there. For the initial velocity that is not a option, so I made an extra function for that. We could just call it directly without going through theano, but then the quadpotential would need to work as theano graph and as normal function. That's not impossible, but I thought that would be confusing. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1769 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMHEKNSQZYz7h3dSPOjK99qxBIXUm9Oks5rfvcXgaJpZM4L9RX1> .

ColCarroll · 2017-02-24T19:32:08Z

thanks again!

ColCarroll reviewed Feb 10, 2017

View reviewed changes

aseyboldt force-pushed the multinomial-nuts branch from 03825de to a107074 Compare February 10, 2017 16:08

aseyboldt force-pushed the multinomial-nuts branch 2 times, most recently from 5eea3c3 to b90e3dc Compare February 11, 2017 23:14

aseyboldt force-pushed the multinomial-nuts branch 2 times, most recently from b42666a to e1623a4 Compare February 13, 2017 17:47

aseyboldt changed the title ~~Multinomial sampling for nuts~~ [WIP] Multinomial sampling for nuts Feb 14, 2017

aseyboldt added 2 commits February 17, 2017 19:53

Implement multinomial nuts

0dffe38

Fix inf handling in nuts

14bef5c

aseyboldt force-pushed the multinomial-nuts branch from e1623a4 to 7b503c5 Compare February 21, 2017 13:22

Use sum of momentum for nuts termination

209d35f

aseyboldt force-pushed the multinomial-nuts branch from 7b503c5 to 209d35f Compare February 21, 2017 14:04

twiecki reviewed Feb 21, 2017

View reviewed changes

twiecki assigned ColCarroll Feb 22, 2017

ColCarroll reviewed Feb 22, 2017

View reviewed changes

aseyboldt force-pushed the multinomial-nuts branch from 95b8c56 to 481ec40 Compare February 24, 2017 14:35

Cleanup multinomial nuts

de8ddf3

aseyboldt force-pushed the multinomial-nuts branch from 481ec40 to de8ddf3 Compare February 24, 2017 14:35

ColCarroll reviewed Feb 24, 2017

View reviewed changes

ColCarroll merged commit f73d2ff into pymc-devs:master Feb 24, 2017

aseyboldt mentioned this pull request Mar 12, 2017

Termination criterion for nuts #1787

Closed


		size = tree1.size + tree2.size
		accept_sum = tree1.accept_sum + tree2.accept_sum

		return f


		def _theano_single_leapfrog3(H, q, p, q_grad, **theano_kwargs):

Uh oh!

[WIP] Multinomial sampling for nuts #1769

[WIP] Multinomial sampling for nuts #1769

Uh oh!

Conversation

aseyboldt commented Feb 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Feb 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseyboldt commented Feb 10, 2017

Uh oh!

twiecki commented Feb 10, 2017

Uh oh!

aseyboldt commented Feb 10, 2017 via email

Uh oh!

twiecki commented Feb 11, 2017

Uh oh!

aseyboldt commented Feb 11, 2017

Uh oh!

twiecki commented Feb 16, 2017

Uh oh!

aseyboldt commented Feb 16, 2017

Uh oh!

twiecki commented Feb 20, 2017

Uh oh!

aseyboldt commented Feb 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki commented Feb 21, 2017

Uh oh!

ColCarroll left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseyboldt commented Feb 24, 2017

Uh oh!

ColCarroll left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseyboldt commented Feb 10, 2017 •

edited

Loading