-
-
Notifications
You must be signed in to change notification settings - Fork 5
🏷️ ufunc annotations for cbrt
, deg2rad
, degrees
, fabs
, rad2deg
, radians
#373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{[f]O} -> $1
cbrt
, deg2rad
, degrees
, fabs
, rad2deg
and radians
cbrt
, deg2rad
, degrees
, fabs
, rad2deg
and radians
cbrt
, deg2rad
, degrees
, fabs
, rad2deg
, radians
edaa930
to
0c261fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ufuncs seem to handle object_
dtypes differently from the others.
If you look at this, you might think that object dtypes are not supported:
>>> np.cbrt(8, dtype=np.object_)
AttributeError: 'int' object has no attribute 'cbrt'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python-input-23>", line 1, in <module>
np.cbrt(8, dtype=np.object_)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type int which has no callable cbrt method
Directly passing an object
array, also doesn't work:
>>> np.cbrt(np.array(8, dtype=np.object_))
AttributeError: 'int' object has no attribute 'cbrt'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python-input-25>", line 1, in <module>
np.cbrt(np.array(8, dtype=np.object_))
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type int which has no callable cbrt method
This seems to contradict the cbrt.types
, which explicitly contains the O->O
signature.
>>> np.cbrt.types
['e->e', 'f->f', 'd->d', 'e->e', 'f->f', 'd->d', 'g->g', 'O->O']
What's going on here?
But, if we take a closer look at the message, we that it mentions a "callable cbrt method". This made me suspect that for object_
input, the ufuncs simply tries to call x.cbrt()
.
A simple experiment confirms that this is indeed what's going on:
>>> class Cube:
... def __init__(self, volume):
... self._size = volume**(1/3)
... def cbrt(self):
... return self._size
...
>>> np.cbrt(Cube(8))
2.0
How about the other ufuncs?
It's prety much the same story: passing object_
input to fabs
calls x.fabs()
, passing it to degrees
will call x.degrees()
, etc.
But, can we type that?
We can. But it will be a lot of work, and it will be messy.
Taking cbrt
as example again, it would require a dedicated _Call11Cbrt
protocol, that instead of x: _ArrayLikeObject
accepts x: _CanCbrt
and returns NDArray[object_]
. The _CanCbrt
type is a new protocol definition:
@type_check_only
class _CanCbrt(Protocol):
def cbrt(self) -> object: ...
We have 6 ufuncs, and this solution requires us to write 2 protocols for each ufunc; that's 12 protocols in total.
That would become a big mess, with lots of code duplication.
Can we do better?
Unfortunately, the _Can{Cbrt,Fabs,Degrees,...}
protocols are unavoidable. But luckily, it only requires 3 lines of code (+ 1 black line) to define one. The price we have to pay for them will be a total of 4 * 6 = 24 lines of code. That seems reasonable.
But it's the remaining 6 callable protocols are the main issue. Most of their overloads would be identical to one another, so we'd get a lot of code duplication.
But if we're smart about it, we could avoid all of that, and instead define a single protocol to rule them all.
It would look something like this
_T_contra = TypeVar("_T_contra", contravariant=True)
class _Call11FloatObject(Protocol[_T_contra]):
# <insert floating scalar overloads>
@overload
def __call__(
self,
x: _T_contra,
/,
out: None = None,
*,
dtype: _DTypeLike[np.object_] | None = None,
) -> Any: ...
# <insert floating array overloads>
@overload
def __call__(
self,
x: _ArrayLikeObject_co | _NestedSequence[_T_contra],
/,
out: None = None,
*,
dtype: _DTypeLike[np.object_] | None = None,
) -> NDArray[np.object_]: ...
# <insert fallback overload>
For e.g. cbrt
you use it as _Call11FloatObject[_CanCbrt]
.
What did those 6 overloads get us?
We managed to precise describe the allowed object-like input, in the case of scalars. But because we use a generic protocol, and Python does not have support for higher-kinded types (HKT), we cannot annotate the return type, and are forced to use Any
.
For object_
array-like input, this did not help at all. That is because np.object_
is not a generic, so we don't know what the type of the underlying object is. So we can't do x: _ArrayLike[np.object_[_CanCbrt]]
, so _ArrayLikeObject_co
is the best we can do.
But we did manage to additionally accept nested sequences of object-like "things", because we know that these "things" are of type _T_contra
, which for e.g. cbrt
we set to _CanCbrt
.
However, the problem with the return types for scalar-like input also applies here. Therefore the best we can do is to return -> NDArray[object_]
.
Wait... will people actually use this?
Honestly, I think that these a very unlikely use-case. I wouldn't be surprised if only one in a million would use it.
However, NumPy has many users. The exact amount is unknown, but I think it's safe to say that it lies somewhere between 10**7
and 10**8
.
If we combine these estimates, then we reach the conclusion that we'd be helping around 10 and 100 people with this (given that our assumptions are actually true).
So yes, it's likely that people will actually use this. But it'll only be a few. There are probably other problems that can be solved in numtype that will help many more, and won't require as much of our time.
So what do we do?
Personally, I think I would just ignore the object_
overloads. Only a very small subset of object-like input is allowed, and there are many alternative to using the ufuncs this way. I would also leave a # NOTE
in the code that mentions that there exists some legal object-like input types but that it's not supported. I'd also reference this PR.
By cheating this way, we effectively pretend that the signatures isn't {efdgO}->$1
, but {efdg}->$1
. That's exactly equivalent to spacing
!
So with this (imho very reasonable) cheat, we can avoid having to write 7 new protocols (or 8 if you also consider the .at
method).
But that being said, I'll leave it up to you want you want to do here.
Thank you for the clarification! Actually, I’ve been investigating this issue myself over the past hour. When I tried to add tests related to dtype=np.object_, I came across the same problem. Regarding the issue, I believe that adding extra complexity to the code just to handle these edge cases (which might be used by only a small number of people), to handle it as spacing and leave some note is an more reasonable tradeoff. I would make a change to keep things consistent with spacing and adding a note referencing the comments you provided. |
0c261fd
to
059c896
Compare
And of course, we can always reconsider if someone opens an issue for it. |
@jorenham I found that [arc]{cos,sin,tan}[h] also have this kind of issue >>> np.cos(10, dtype=np.object_)
AttributeError: 'int' object has no attribute 'cos'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python-input-7>", line 1, in <module>
np.cos(10, dtype=np.object_)
~~~~~~^^^^^^^^^^^^^^^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type int which has no callable cos method |
Good catch!
That's probably for the best then. Types with a |
Thanks, Guan-Ming |
Toward #230