Issues getting variable in hierarchical model to update #2271

aspassiani · 2017-06-05T06:54:57Z

I have previously been working with OpenBUGS/WinBUGS to perform my Bayesian statistics but have decided to move over to using the PYMC3 package in Python. I believe I have correctly converted my code to be used with PYMC3 but I am having issues with getting my output from PYMC3 to match my output from BUGS, specifically one of my variables does not update from its initialized values. My code is written as follows:

model2 = pm.Model()
with model2:
    # Co-efficients
    alpha = pm.Normal('alpha', mu=0.0, sd=1000)
    beta = pm.Deterministic('beta', tt.exp(alpha))
    lamda = pm.Gamma('lamda', mu=1.0e-2, sd=100)
    # Population Effect
    pop_eff= pm.Deterministic('pop_eff', tt.exp((-1*beta)/tt.exp(pop_den_all))) 
    # Mean/Expected Event Occurrence Rate
    lamdal = pm.Deterministic('lamdal', lamda*5625)
    # Actual Occurrence of Events
    Tlatent_new = pm.Poisson('Tlatent_new', mu=lamdal, shape=56)
    # Observed Event Counts
    Tobs_new = pm.Binomial('Tobs_new', n=Tlatent_new, p=pop_eff, shape=56, observed=Tobs)
    # Initialization
    start1 = {'alpha': 0.5, 'lambda': 1., 'Tlatent_new': Tlatent}
    start2 = {'alpha': 1., 'lambda': 2., 'Tlatent_new': Tlatent}
    start = [start1, start2]
    step1 = pm.Metropolis([alpha, lamda])
    step2 = pm.BinaryGibbsMetropolis([Tlatent_new])

with model2:
    trace2 = merge_traces([pm.sample(2000, step=[step1, step2], tune=1000, start=start[i], chain=i)
                            for i in range(2)])

with sample data:

#Sample Data
Tobs = np.array([0,0,1,0,0,0,8,3,0,0,2,0,1,0,0,0,0,3,4,1,1,0,1,0,1,0,0,0,0,0,0,0,0,3,0,0,0,1,5,2,
                    27,17,26,9,6,0,1,3,4,5,1,4,16,73,26,3])
pop_den_all = np.array([0.01,0.69,1.03,0.12,0.22,0.59,26.60,9.76,0.0033,0.064,0.81,0.96,0.76,1.01,
                        0.20,0.79,4.47,12.21,18.56,1.68,0.33,0.001,0.22,0.046,0.11,0.065,0.33,0.002,
                        0.065,0.021,0.33,0.086,0.60,1.43,0.072,0.13,0.73,2.89,7.22,5.42,24.35,104.88,
                        122.54,17.71,5.13,1.70,0.037,2.05,5.43,2.83,1.61,3.00,20.19,284.83,98.69,2.67])
#Initialization Data
Tlatent = np.array([80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,
                    80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,80,
                    80,80,80,80,80,80])

When I run the model, the variable Tlatent_new does not update from the initialized values I give it in Tlatent. This occurs when I want to include the effects of population density in my model using this line Tobs_new = pm.Binomial('Tobs_new', n=Tlatent_new, p=pop_eff, shape=56, observed=Tobs). When I do not include this line the output of Tlatent_new differs from the initial guess Tlatent.

I cannot figure out what is wrong with that additional line. I think it might have something to do with the use of the Binomial distribution because if I change it to a different distribution I will see a change in Tlatent_new from the initial guess Tlatent. I don't see why the Binomial distribution would not work. I thought my choice of sampling method might be an issue but I have tried changing that as well with no luck.

Any help as to why this does not work is greatly appreciated.

Thanks in advance,

The text was updated successfully, but these errors were encountered:

junpenglao · 2017-06-05T09:31:39Z

The Binomial distribution is not working in your model because:
1, some elements in pop_eff being 0 or 1;
2, Tlatent_new can be smaller than Tobs;
making the logp of the model -inf. You can check it by doing:

Tobs_new.logp(model2.test_point)
model2.logp(model2.test_point)

Maybe you can try to reparameterize your model into a centered model, for example Tlatent_new = Tobs + Occur with Occur > 0?

junpenglao · 2017-06-05T10:50:38Z

Seems to work with a small constraint on pop_eff, but you should still consider reparameterized your model.

with model2:
    # Co-efficients
    alpha = pm.Normal('alpha', mu=0.0, sd=1000)
    beta = pm.Deterministic('beta', tt.exp(alpha))
    lamda = pm.Gamma('lamda', mu=1.0e-2, sd=100)
    # Population Effect
    pop_eff= pm.Deterministic('pop_eff', tt.exp(-1e-10+(-1*beta)/tt.exp(pop_den_all))) 
    # Mean/Expected Event Occurrence Rate
    lamdal = pm.Deterministic('lamdal', lamda*5625)
    # Actual Occurrence of Events
    Tlatent_new = pm.Poisson('Tlatent_new', mu=lamdal, shape=56, testval=Tlatent)
    # Observed Event Counts
    Tobs_new = pm.Binomial('Tobs_new', n=Tlatent_new, p=pop_eff, observed=Tobs)

    trace1 = pm.sample(3000)

aspassiani · 2017-06-05T11:51:03Z

I see, I assumes that it would be bounded to be equal to or larger than Tobs as that is how it was dealt with in BUGS. Thanks for pointing that out. I assume that PYMC3 doesn't offer a way to have Tobs set as a lower bound for Tlatent_new in the model?

twiecki · 2017-06-05T11:56:06Z

There is pm.Bound() but not sure it works with another parameter.

junpenglao · 2017-06-05T14:23:05Z

I am going to close this because it is not an issue but feel free to continue the conversation.

aspassiani · 2017-06-06T04:44:48Z

I've tried constraining pop_eff like you have done, I see that it no longer makes logp of the model -inf but my Tlatent_new values are still equal to the starting value I assign to it (80), which it greater than any of the Tobs values (the largest being 73). I'm not sure if the Tlatent_new values changed for you but looking at my traces for Tlatent_new its value never deviates from 80. So, I don't quite understand how Tlatent_new being smaller than Tobs could be the problem.

I tired to use pm.Bound() to bound the Poisson distribution for Tlatent_new to see if that made a difference but that didn't change anything. I couldn't see if there was a way to have a different lower value for each cell to related to the corresponding Tobs value for the cell so I just used a value that was larger than the max Tobs.

junpenglao · 2017-06-06T05:16:27Z

Are you on master and theano 0.9.0? Tlatent_new does change in my case.

with model2:
    trace1 = pm.sample(3000, njobs=2)
pm.traceplot(trace1, varnames=['alpha', 'lamda', 'Tlatent_new']);

This is what I got

Having said that, you might need to run the chain a bit longer for convergence.

Tlatent_new being smaller than Tobs is definatly a problem - you can try setting the testval to smaller than 73 (e.g., Tlatent = np.ones(56)*72).

You can also try integrate out the Poisson similar to the style in Stan.

aspassiani · 2017-06-06T06:03:58Z

Okay, I see why it wasn't outputting, had to do with the sampling method I was using. I now get the same output as you have shown.

I understand that Tlatent_new needs to be larger than Tobs_new, just assumed that when giving it an initial value above Tobs_new and specifying the observed values for Tobs_new that successive iteration of Tlatent_new would be bounded by the observed values. I will try your suggestions as ways to fix my model.

Thanks!

junpenglao · 2017-06-06T06:43:06Z

No problem! You can have a look at the more general tips about porting Bugs/JAGS model into pymc3 here:
http://pymc-devs.github.io/pymc3/notebooks/PyMC3_tips_and_heuristic.html

aseyboldt · 2017-06-06T08:35:42Z

@aspassiani you could also try to use a continuous variable for Tlatent_new. That allows you to use NUTS, and you can set the bounds using:

BoundStudentT = pm.Bound(pm.StudentT, lower=Tobs)  # or whatever dist seems appropriate
Tlatent_new = BoundStudentT('Tlatent_new', nu=3, sd=sd)

If you need a Poisson you might be able to define a continuous version of it using pm.DensityDist.

junpenglao · 2017-06-06T08:44:49Z

@aseyboldt I dont think it works with the lower bound being a vector in pm.Bound. Currently, there are a few limitations of pm.Bound need to be improved.

aseyboldt · 2017-06-06T10:23:12Z

@junpenglao @aspassiani oh. There is even some np.isinf in distributions.Bounded. This shouldn't be there...
The transformation itself works, so you could use this:

with pm.Model() as model:
    trafo = pm.distributions.transforms.lowerbound([1., 2.])
    pm.Normal('a', mu=0, sd=1, transform=trafo, shape=2, testval=[3., 3.])

aseyboldt · 2017-06-06T12:44:34Z

xref #2277

aspassiani · 2017-06-07T07:08:31Z

@aseyboldt is it not possible to use the transformation you showed for a Poisson distribution? I get this error when using it for a Poisson,

The error when converting the test value to that variable type:
TensorType(int64, vector) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to int64, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([ 100., 100., 100., ..., 100., 100., 100.])"

But if I use the same data for example a Normal distribution, like your example, I don't get that error.

I would like to try defining a continuous version of Poisson like you suggested in your earlier comment but I'm not quite sure how to use pm.DensityDist. Do you know of a good example that uses it that I could try to follow/re-create?

Thanks!

aseyboldt · 2017-06-07T09:40:14Z

The transformation is only defined for continuous distributions.
Now that I think about it a bit, maybe using a gamma prior is better than some hacked continuous version of the poisson. Also, I'm a bit stuck at computing the normalization constant (or the part that depends on λ) if we use e^(-λ) λ^x / Γ(x+1) as pdf. How does ∫λ^x / Γ(x+1)dx depend on λ? And I want to have latex in github issues...

aspassiani · 2017-06-12T00:04:05Z

@aseyboldt I'm not sure about that. But I'm still unsure of about changing the Poisson distribution.

The reasoning behind using the Poisson is because I am trying to model the occurrence of weather events. The Poisson distribution was chosen since it expresses the probability of a given number of events and it assumes the events occur independently. I then used this in the Binomial distribution since not all events that occur are observed. Population density is utilized to determine the probability of observing an event (pop_eff) which becomes my "p" parameter and Tlatent_new my "n" parameters. Tobs_new is therefore the number of events successfully observed and is given by my observational dataset, my observed values.

As @junpenglao pointed out apparently my pop_eff is also an issue. I get value of 1 or 0 because, I'll have areas where the population density is large enough to observe all events (pop_eff = 1), and areas where there is zero population (pop_eff = 0). OpenBUGS seemed to be able to deal with these two issues. I don't see constraining pop_eff like @junpenglao did, to allow the model to run, being an issue and affecting the output of the model much, but maybe I'm overlooking something.

junpenglao added the questions label Jun 5, 2017

junpenglao closed this as completed Jun 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues getting variable in hierarchical model to update #2271

Issues getting variable in hierarchical model to update #2271

aspassiani commented Jun 5, 2017

junpenglao commented Jun 5, 2017 •

edited

Loading

junpenglao commented Jun 5, 2017

aspassiani commented Jun 5, 2017

twiecki commented Jun 5, 2017

junpenglao commented Jun 5, 2017

aspassiani commented Jun 6, 2017

junpenglao commented Jun 6, 2017

aspassiani commented Jun 6, 2017

junpenglao commented Jun 6, 2017

aseyboldt commented Jun 6, 2017

junpenglao commented Jun 6, 2017 •

edited

Loading

aseyboldt commented Jun 6, 2017

aseyboldt commented Jun 6, 2017

aspassiani commented Jun 7, 2017

aseyboldt commented Jun 7, 2017 •

edited

Loading

aspassiani commented Jun 12, 2017

Issues getting variable in hierarchical model to update #2271

Issues getting variable in hierarchical model to update #2271

Comments

aspassiani commented Jun 5, 2017

junpenglao commented Jun 5, 2017 • edited Loading

junpenglao commented Jun 5, 2017

aspassiani commented Jun 5, 2017

twiecki commented Jun 5, 2017

junpenglao commented Jun 5, 2017

aspassiani commented Jun 6, 2017

junpenglao commented Jun 6, 2017

aspassiani commented Jun 6, 2017

junpenglao commented Jun 6, 2017

aseyboldt commented Jun 6, 2017

junpenglao commented Jun 6, 2017 • edited Loading

aseyboldt commented Jun 6, 2017

aseyboldt commented Jun 6, 2017

aspassiani commented Jun 7, 2017

aseyboldt commented Jun 7, 2017 • edited Loading

aspassiani commented Jun 12, 2017

junpenglao commented Jun 5, 2017 •

edited

Loading

junpenglao commented Jun 6, 2017 •

edited

Loading

aseyboldt commented Jun 7, 2017 •

edited

Loading