-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Updates to pymc.Hypergeometric docstrings #5488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added bounds to the docstring of the Hypergeometric dist
pymc/distributions/discrete.py
Outdated
@@ -972,11 +972,11 @@ class HyperGeometric(Discrete): | |||
Parameters | |||
---------- | |||
N : integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are scalars the only valid type for all 3 arguments? cc @ricardoV94 ? (no idea who knows, feel free to tag someone else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be only integers: https://en.wikipedia.org/wiki/Hypergeometric_distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe an array of integers is also supported? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logcdf does say:
# logcdf can only handle scalar values at the moment
if np.ndim(value):
raise TypeError(
f"HyperGeometric.logcdf expects a scalar value but received a {np.ndim(value)}-dimensional object."
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe logp supports it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I use a different format to specify the array types? Example integer, array_like[integer] or TensorVariable[integer]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, numpydoc formatting is very clear in the array of type
format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to use tensor_like of integer
and we'll add its definition on the glossary so we have a type that covers scalars, arrays, aesara tensorvariable and randomvariable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! Picked integer, array_like or TensorVariable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. We decided to go with tensor_like of integer
which will get rendered as a link to our glossary where we have an explanation on what inputs are accepted for arguments documented like this: https://docs.pymc.io/en/latest/glossary.html#term-tensor_like
@cuchoi did you mean to close the PR? maybe open another not from |
I closed the PR while adding more changes to it to avoid it being reviewed before adding them. Should I change the PR to be from a branch of my forked repo? |
Added docstrings for parameters "good" and "bad" in logp and logcdf. These parameters are different than the ones used in init, which is odd to me. They transform the N and k parameters to bad (which is N - k) and good (which is k), but this transformation could happen inside logp and logcdf. Would it make sense to always use N, k? |
good : integer, array_like or TensorVariable | ||
Number of successful individuals in the population. Alias for parameter :math:`k`. | ||
bad : integer, array_like or TensorVariable | ||
Number of unsuccessful individuals in the population. Alias for :math:`N-k`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use "tensor_like of integer" here @OriolAbril ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in the process of removing the distribution methods from the documentation and leaving only the dist
one. The Distribution class is still being refactored, see #5308 for some more details (I don't really understand what is going on very well) and those methods might be removed completely, and because they should not be used themselves already but called via pm.logp()
or pm.logcdf
I think it's best to leave them be for now.
As part of the work of #5459:
logp
andlogcdf
. These parameters are different than the ones used in__init__
, which is odd to me. They transform theN
andk
parameters tobad
(which isN - k
) andgood
(which isk
), but this transformation could happen insidelogp
andlogcdf
. Would it make sense to always use N, k?#DataUmbrellaPyMCSprint