Skip to content

What is the p random variable for in glm models? #1717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chumsley opened this issue Jan 26, 2017 · 4 comments
Closed

What is the p random variable for in glm models? #1717

chumsley opened this issue Jan 26, 2017 · 4 comments

Comments

@chumsley
Copy link

chumsley commented Jan 26, 2017

I was using glm to create a simple logistic regression model, and I noticed that it seems to be adding a random variable named p. This variable is sampled over along with the intercept and coefficients, and changing its value seems to affect the total logp of the model. (I.e., if I pass in samples to the model's logp method that differ only in p_logodds_ I get different return values).

I don't understand what this variable is for, and I am concerned that it is interfering with proper sampling---at any rate, when I sample from the model it seems to give results which strongly disagree with the equivalent model and data in R.

Is p meant to be there? If so, what is it for?

Minimal confusing example:

import numpy as np
import pymc3 as pm

ys = np.array([0, 0, 1, 0, 1, 1])
xs = np.array([10, 5, 1, -1, -5, -10])
data = {"x":xs, "y":ys}
with pm.Model():
    pm.glm.glm("y ~ x", data, family=pm.glm.families.Binomial())
    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(1000, step=step)
    print pm.df_summary(trace)

    sam = trace[-1]
    ix = np.argmax([abs(s['p_logodds_'] - sam['p_logodds_']) for s in trace])
    sam2 = dict(sam)
    sam2['p'] = trace[ix]['p']
    sam2['p_logodds_'] = trace[ix]['p_logodds_']

    print "\nlogp(p=%s)=%s,  logp(p=%s)=%s" % (sam['p'], M.logp(sam), sam2['p'], M.logp(sam2))

"""
Optimization terminated successfully.
         Current function value: 32.964035
         Iterations: 7
         Function evaluations: 10
         Gradient evaluations: 10
100%|##########| 1000/1000 [00:00<00:00, 1521.61it/s]
               mean        sd  mc_error   hpd_2.5  hpd_97.5
Intercept -3.548549  0.979635  0.088712 -3.750052 -3.750052
x         -1.061083  0.097358  0.006966 -1.058443 -1.058443
p          0.082809  0.106645  0.009598  0.061961  0.061961

logp(p=0.00116844038631)=-39.3032225537,  logp(p=0.995574022242)=-37.9746678716
"""

Thanks!

@twiecki
Copy link
Member

twiecki commented Jan 28, 2017

Hm, yeah, that shouldn't be there. Not sure where that's coming from.

@bhargavvader
Copy link
Contributor

I was trying to have a go at this but really had no idea what to do - any pointers?

@twiecki
Copy link
Member

twiecki commented Feb 16, 2017

I think https://github.com/pymc-devs/pymc3/blob/master/pymc3/glm/families.py#L104 just needs to be an empty dict.

uhho added a commit to uhho/pymc3 that referenced this issue Feb 22, 2017
An attempt to solve pymc-devs#1717 and issues with `docs/source/notebooks/GLM-logistic.ipynb` at the same time.
This commits reverts change made in PR pymc-devs#685.
twiecki pushed a commit that referenced this issue Feb 24, 2017
An attempt to solve #1717 and issues with `docs/source/notebooks/GLM-logistic.ipynb` at the same time.
This commits reverts change made in PR #685.
@bhargavvader
Copy link
Contributor

Is this issue fixed now?

@twiecki twiecki closed this as completed Feb 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants