-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
MAINT: avoid nested asarray
calls
#22947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -8237,7 +8237,7 @@ def test_monotonicity(self, variant, method, xp): | |||
pvaluess = xp.sort(xp.asarray(rng.uniform(0, 1, size=(m, n))), axis=0) | |||
|
|||
combined_pvalues = xp.asarray([ | |||
stats.combine_pvalues(pvaluess[i, :], method=method)[1] | |||
float(stats.combine_pvalues(pvaluess[i, :], method=method)[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should already be a float? As per https://scipy.github.io/devdocs/reference/generated/scipy.stats.combine_pvalues.html#combine-pvalues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Pdb) i=0
(Pdb) p stats.combine_pvalues(pvaluess[i, :], method=method)[1]
Array(0.39087839, dtype=array_api_strict.float64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks like a bug 🐛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, from where I stand it seems to make sense to merge this PR to fix gh-22945 and improve at leisure.
Feel free to take over or close this and fix the combine_pvalues
bug instead (no idea how disruptive it will be, probably tolerable).
To me, the main takeaway is that the current main of array-api-strict
:
- has been reported as not problematic for scikit-learn (BUG: do not allow asarray of nested sequences of arrays data-apis/array-api-strict#147 (comment))
- a fix for scipy is ~dozen LOC
Which means there's no need to revert data-apis/array-api-strict#147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns a np.float64 on numpy, which is a subclass of float.
However, res[()]
is a no-op on all other backends.
When it comes to lazy backends, they cannot return a float as that would materialize the graph.
There are two options here:
- change the function to
return float(res) if res.shape == () and not is_lazy_array(res) else res
. Update the documentation to clarify this specific quirk of lazy backends. - Just
return res[()]
. Update the documentation to clarify that it's a float when working in numpy, and a 0d array otherwise.
I personally prefer 2 by far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I'm assuming that we're looking for technicalities, since we're ignoring the glaring fact that concrete generic
subclasses don't provide any behavior for >0 dimensions, so they are not usable as arrays in any practical sense.
The much more practical limitation, and the reason that NumPy scalars are not acceptable in place of real NumPy arrays, is that instances do not support __setitem__
, but I guess that is also ignored if we consider JAX arrays to be arrays. (I don't think __setitem__
actually excuses immutable types, though. If immutable arrays are excused elsewhere in the standard, it probably deserves mention in __setitem__
, too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concrete generic subclasses don't provide any behavior for >0 dimensions.
Should they though? They implement all required behaviours when they interact with an ndarray with >0 dimensions, by returning a new ndarray.
Read-only arrays are explicitly catered for (with maturity issues, e.g. being currently unable to identify them).
JAX arrays are arrays.
NumPy arrays with flags.writeable == False
are arrays.
Fun fact: read-only np.ndarrays currently fall foul the standard because their in-place op methods (__iadd__
etc.) raise, whereas they're expected to fall back to the default behaviour to cause the Python interpreter to swap out the object with the return value of the matching op (__add__
etc.). Again something that can be easily fixed upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that another implied rule of this game is that we ignore array type in = array type out. If this is not explicit somewhere, it is implied by the signatures if "array" in the input type description equals "array" in the output description. (Otherwise, is it ok for xp
functions to accept yp
type as input and return zp
type output?)
They implement all required behaviours when they interact with an ndarray with >0 dimensions,
I think so, because:
Furthermore, a conforming implementation of the array API standard must support, at minimum, array objects of rank (i.e., number of dimensions) 0, 1, 2, 3, and 4 and must explicitly document their maximum supported rank N.
generic
and its subclasses do not support this. ndarray
does, but the two are unrelated except that both inherit from object
IIUC.
Fun fact: read-only np.ndarrays currently fall foul the standard because their in-place op
I thought so, but they still fail for that reason, so I guess that is another answer to your question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no requirements for the Array classes of a namespace to be related though. Array is a Protocol
.
Furthermore, a conforming implementation of the array API standard must support, at minimum, array objects of rank (i.e., number of dimensions) 0, 1, 2, 3, and 4 and must explicitly document their maximum supported rank N.
This quote doesn't say that all implementations of the array Protocol
in the namespace must support arrays of rank 4, only that some must, and (I'm personally adding) you must not need to know if the specific object you have in your hands does. np.int64(1)[None, None]
transparently returns a 2-dimensionl ndarray, as does np.reshape
or np.broadcast_to
applied to a generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, so let's improve the standard to make all these things more clear, and let's make NumPy scalars (and other immutable arrays) actually compliant, if these fixes are indeed trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up tracked in #22954. Thanks all!
fixes #22945
cc @crusaderky