Skip to content

Censored data and Weibull AFT model #3095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 18, 2018
Merged

Censored data and Weibull AFT model #3095

merged 21 commits into from
Jul 18, 2018

Conversation

eigenfoo
Copy link
Member

Hello!

This PR closes #2899 and #2942 and adds two new notebooks to the examples:

  1. censored_data.ipynb: porting Add examples of censored data models #1870 to a notebook. I also took the liberty of touching up the explanations and visualizations there. Not sure if we now want to delete the original .py example file, but if we do, I'd be happy to do that.

  2. weibull_aft.ipynb: I haven't gotten around to this yet, but as far as I can tell it's a basic Weibull AFT model ported from here.

I'll probably finish up the Weibull AFT notebook sometime in the next few days; I'll ping when the PR is ready for review.

@junpenglao commented here about putting some notebooks under a new session and indexing them on the website. I'm not sure what that means, or how to go about doing it. Could someone give me some pointers? 😃

I think the same comment also suggested adding @AustinRochford's blog post on AFT models as an example notebook, but I'm not sure if we have the author's permission to do that. If we do, I'd be happy to tack that on to this PR (which would bring the number of new notebooks to three).

@junpenglao
Copy link
Member

What I meant is to create a new session in https://github.com/pymc-devs/pymc3/blob/master/docs/source/examples.rst, the website will then build and automatically index the notebook correctly.
Maybe after the GLM session?

@junpenglao
Copy link
Member

  • There are some guideline about unifying the style of the docs, please have a look at this issue: Update notebooks and examples #2834
  • you can put ; at the end of plotting command suppresses the text output: pm.traceplot(trace); that way you also dont need plt.show()
  • you should use the default initialization and try to get rid of the divergence samples
  • you dont need to implement the normal_lcdf and llcdf, you can import them from pymc3 after Fix numerics of TruncatedNormal #3077
  • yes go ahead and remove censored_data.py
  • more information/introduction is needed for weibull_aft notebook

@junpenglao junpenglao added the WIP label Jul 14, 2018
@junpenglao
Copy link
Member

Ah didnt realized this is WIP... :-p

@eigenfoo
Copy link
Member Author

No worries! These commits should fix everything about censored_data.ipynb. Will get around to Weibull AFT tomorrow.

Still waiting to see if we can port the blog post as a notebook. If not, is it possible/permissible to link to the blog post from the docs? It's a valuable resource: I learnt a lot from it!

@AustinRochford
Copy link
Member

@eigenfoo feel free to add my post!

@AustinRochford
Copy link
Member

I do appreciate you asking, but my posts are all CC BY-SA licensed so feel free to adapt, reuse, etc. in the future.
https://github.com/AustinRochford/blog-content

@AustinRochford
Copy link
Member

Notebook looks good. My one comment would be that instead of a Bernoulli RV for for the censored times with all observed ones, it may be more idiomatic to use a Potential. Mathematically it should be equivalent, and I probably should have done that in the first place. ;)

@eigenfoo
Copy link
Member Author

eigenfoo commented Jul 15, 2018

@AustinRochford I'm a bit confused about the proper usage of pm.Potential. From this thread on discourse it sounds like using pm.Potential is a bit of a hack, and you'd be better off if you can get by with supported distributions.

Personally speaking, I feel much more comfortable reading pm.Bernoulli and needing to figure out how the math works, rather than reading pm.Potential and needing to look up an obscure function (which doesn't appear to be well documented, but that's a separate issue). Unless it makes a difference in sampling, it seems to me that using well-known distributions would be more idiomatic... Could you please advise?

I think I'll change the implementations of all of the censored data models to whatever gets agreed on.

cc @junpenglao @aseyboldt

@junpenglao
Copy link
Member

junpenglao commented Jul 16, 2018

I agree with @AustinRochford, using a potential is clearer in this case as these values are not observed (but integrate over the tail of the distribution), which gives something like:

y_cens = pm.Potential(
        'y_cens', suvival_func(y_std[cens], η[cens_], s))
    )

As you can see, using Bernoulli doesnt make it clearer, as you need explaination of why observed=np.ones(cens.sum() is needed.

@eigenfoo
Copy link
Member Author

Deleting censored_data.py would make a broken link in the docs: I've pointed it to the weibull_aft.ipynb notebook.

I made a first pass through Weibull AFT notebook, but I'll probably want to take another look at it in the next few days before merging.

@junpenglao
Copy link
Member

No problem, take your time.
Also dont forget to add a line to the release note (maybe easier after 3.5 is released)

@twiecki
Copy link
Member

twiecki commented Jul 17, 2018

@eigenfoo Looks great, make sure to keep the output cells in the NB.

@eigenfoo
Copy link
Member Author

Ready for review/merge!

@junpenglao
Copy link
Member

LGTM mostly, small reminder:
1, please add a line in the release note
2, in bayes_param_survival_pymc3.ipynb following @AustinRochford's comment you should replace pm.Bernoulli with pm.Potential for the censored observed.

@eigenfoo
Copy link
Member Author

Ah right! Thanks for reminding me. These commits should do the trick.

@AustinRochford
Copy link
Member

:shipit:

@twiecki twiecki merged commit fc2bb56 into pymc-devs:master Jul 18, 2018
@twiecki
Copy link
Member

twiecki commented Jul 18, 2018

Thanks @eigenfoo and congrats on your first PR!

@springcoil
Copy link
Contributor

Awesome

@AustinRochford
Copy link
Member

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement of Survival Analysis documentation
5 participants