Skip to content

🏷️ ufunc annotations for cbrt, deg2rad, degrees, fabs, rad2deg, radians #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 22, 2025

Conversation

guan404ming
Copy link
Member

@guan404ming guan404ming commented Mar 22, 2025

Toward #230

@guan404ming guan404ming changed the title 🏷️ add protocol for {[f]O} -> $1 🏷️ ufunc annotations for cbrt, deg2rad, degrees, fabs, rad2deg and radians Mar 22, 2025
@guan404ming guan404ming changed the title 🏷️ ufunc annotations for cbrt, deg2rad, degrees, fabs, rad2deg and radians 🏷️ ufunc annotations for cbrt, deg2rad, degrees, fabs, rad2deg, radians Mar 22, 2025
@jorenham jorenham added this to the v2.2.x.0 milestone Mar 22, 2025
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ufuncs seem to handle object_ dtypes differently from the others.

If you look at this, you might think that object dtypes are not supported:

>>> np.cbrt(8, dtype=np.object_)
AttributeError: 'int' object has no attribute 'cbrt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<python-input-23>", line 1, in <module>
    np.cbrt(8, dtype=np.object_)
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type int which has no callable cbrt method

Directly passing an object array, also doesn't work:

>>> np.cbrt(np.array(8, dtype=np.object_))
AttributeError: 'int' object has no attribute 'cbrt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<python-input-25>", line 1, in <module>
    np.cbrt(np.array(8, dtype=np.object_))
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type int which has no callable cbrt method

This seems to contradict the cbrt.types, which explicitly contains the O->O signature.

>>> np.cbrt.types
['e->e', 'f->f', 'd->d', 'e->e', 'f->f', 'd->d', 'g->g', 'O->O']

What's going on here?

But, if we take a closer look at the message, we that it mentions a "callable cbrt method". This made me suspect that for object_ input, the ufuncs simply tries to call x.cbrt().

A simple experiment confirms that this is indeed what's going on:

>>> class Cube:
...     def __init__(self, volume):
...         self._size = volume**(1/3)
...     def cbrt(self):
...         return self._size
... 
>>> np.cbrt(Cube(8))
2.0

How about the other ufuncs?

It's prety much the same story: passing object_ input to fabs calls x.fabs(), passing it to degrees will call x.degrees(), etc.

But, can we type that?

We can. But it will be a lot of work, and it will be messy.

Taking cbrt as example again, it would require a dedicated _Call11Cbrt protocol, that instead of x: _ArrayLikeObject accepts x: _CanCbrt and returns NDArray[object_]. The _CanCbrt type is a new protocol definition:

@type_check_only
class _CanCbrt(Protocol):
    def cbrt(self) -> object: ...

We have 6 ufuncs, and this solution requires us to write 2 protocols for each ufunc; that's 12 protocols in total.
That would become a big mess, with lots of code duplication.

Can we do better?

Unfortunately, the _Can{Cbrt,Fabs,Degrees,...} protocols are unavoidable. But luckily, it only requires 3 lines of code (+ 1 black line) to define one. The price we have to pay for them will be a total of 4 * 6 = 24 lines of code. That seems reasonable.

But it's the remaining 6 callable protocols are the main issue. Most of their overloads would be identical to one another, so we'd get a lot of code duplication.
But if we're smart about it, we could avoid all of that, and instead define a single protocol to rule them all.
It would look something like this

_T_contra = TypeVar("_T_contra", contravariant=True)

class _Call11FloatObject(Protocol[_T_contra]):
    # <insert floating scalar overloads>
    @overload
    def __call__(
        self,
        x: _T_contra,
        /,
        out: None = None,
        *,
        dtype: _DTypeLike[np.object_] | None = None,
    ) -> Any: ...
    # <insert floating array overloads>
    @overload
    def __call__(
        self,
        x: _ArrayLikeObject_co | _NestedSequence[_T_contra],
        /,
        out: None = None,
        *,
        dtype: _DTypeLike[np.object_] | None = None,
    ) -> NDArray[np.object_]: ...
    # <insert fallback overload>

For e.g. cbrt you use it as _Call11FloatObject[_CanCbrt].

What did those 6 overloads get us?

We managed to precise describe the allowed object-like input, in the case of scalars. But because we use a generic protocol, and Python does not have support for higher-kinded types (HKT), we cannot annotate the return type, and are forced to use Any.

For object_ array-like input, this did not help at all. That is because np.object_ is not a generic, so we don't know what the type of the underlying object is. So we can't do x: _ArrayLike[np.object_[_CanCbrt]], so _ArrayLikeObject_co is the best we can do.
But we did manage to additionally accept nested sequences of object-like "things", because we know that these "things" are of type _T_contra, which for e.g. cbrt we set to _CanCbrt.
However, the problem with the return types for scalar-like input also applies here. Therefore the best we can do is to return -> NDArray[object_].

Wait... will people actually use this?

Honestly, I think that these a very unlikely use-case. I wouldn't be surprised if only one in a million would use it.

However, NumPy has many users. The exact amount is unknown, but I think it's safe to say that it lies somewhere between 10**7 and 10**8.

If we combine these estimates, then we reach the conclusion that we'd be helping around 10 and 100 people with this (given that our assumptions are actually true).

So yes, it's likely that people will actually use this. But it'll only be a few. There are probably other problems that can be solved in numtype that will help many more, and won't require as much of our time.

So what do we do?

Personally, I think I would just ignore the object_ overloads. Only a very small subset of object-like input is allowed, and there are many alternative to using the ufuncs this way. I would also leave a # NOTE in the code that mentions that there exists some legal object-like input types but that it's not supported. I'd also reference this PR.

By cheating this way, we effectively pretend that the signatures isn't {efdgO}->$1, but {efdg}->$1. That's exactly equivalent to spacing!

So with this (imho very reasonable) cheat, we can avoid having to write 7 new protocols (or 8 if you also consider the .at method).

But that being said, I'll leave it up to you want you want to do here.

@guan404ming
Copy link
Member Author

Thank you for the clarification! Actually, I’ve been investigating this issue myself over the past hour. When I tried to add tests related to dtype=np.object_, I came across the same problem.

Regarding the issue, I believe that adding extra complexity to the code just to handle these edge cases (which might be used by only a small number of people), to handle it as spacing and leave some note is an more reasonable tradeoff.

I would make a change to keep things consistent with spacing and adding a note referencing the comments you provided.

@jorenham
Copy link
Member

Regarding the issue, I believe that adding extra complexity to the code just to handle these edge cases (which might be used by only a small number of people), to handle it as spacing and leave some note is an more reasonable tradeoff.

And of course, we can always reconsider if someone opens an issue for it.

@jorenham jorenham self-requested a review March 22, 2025 16:31
@guan404ming
Copy link
Member Author

@jorenham I found that [arc]{cos,sin,tan}[h] also have this kind of issue
chould we handle them like we discuss above?

>>> np.cos(10, dtype=np.object_)
AttributeError: 'int' object has no attribute 'cos'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    np.cos(10, dtype=np.object_)
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type int which has no callable cos method

@jorenham
Copy link
Member

I found that [arc]{cos,sin,tan}[h] also have this kind of issue

Good catch!

should we handle them like we discuss above?

That's probably for the best then. Types with a .cos methods are also pretty uncommon, at least in numpy and the python stdlib.

@jorenham jorenham merged commit 3e87a93 into numpy:main Mar 22, 2025
21 checks passed
@jorenham
Copy link
Member

Thanks, Guan-Ming

@guan404ming guan404ming deleted the 1-to-1-float-obj branch March 22, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants