Closed
Description
This RFC proposes adding support to the array API specification for repeating each element of an array.
Overview
Based on array comparison data, the API is available in most array libraries. The main exception is PyTorch which deviates in its naming convention (repeat_interleave
vs NumPy et al's repeat
).
Prior art
- NumPy: https://numpy.org/doc/stable/reference/generated/numpy.repeat.html
- PyTorch: https://pytorch.org/docs/stable/generated/torch.repeat_interleave.html
- TensorFlow: https://www.tensorflow.org/api_docs/python/tf/repeat
- JAX: https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.repeat.html
- CuPy: https://docs.cupy.dev/en/stable/reference/generated/cupy.repeat.html
- Dask: https://docs.dask.org/en/stable/generated/dask.array.repeat.html#dask.array.repeat
Proposal
def repeat(x: array, repeats: Union[int, Sequence[int], array], /, *, axis: Optional[int] = None)
-
repeats: the number of repetitions for each element.
If
axis
is notNone
,- if
repeats
is an array,repeats.shape
must broadcast tox.shape[axis]
. - if
repeats
is a sequence of ints,len(repeats)
must broadcast tox.shape[axis]
. - if
repeats
is an integer,repeats
must be broadcasted to match the size of a specifiedaxis
.
If
axis
isNone
,- if
repeats
is an array,repeats.shape
must broadcast toprod(x.shape)
. - if
repeats
is a sequence of ints,len(repeats)
must broadcast toprod(x.shape)
. - if
repeats
is an integer,repeats
must be broadcasted to match the size of the flattened array.
- if
-
axis: specifies the axis along which to repeat values. If
None
, use a flattened input array and return a flat output array.
Questions
- Both PyTorch and JAX support a kwarg for specifying the output size in order to avoid stream synchronization (PyTorch) and to allow compilation (JAX). Without such kwarg support, is this API viable? And what are the reasons for needing this kwarg when other array libraries (TensorFlow) omit such a kwarg?
- When flattening the input array, flatten in row-major order? (precedent:
nonzero
) - Is PyTorch okay adding a
repeat
function in its main namespace, given the divergence in behavior fortorch.Tensor.repeat
, which behaves similar tonp.tile
? - CuPy only allows
int
,List
, andTuple
for repeats, not an array. PyTorch may prefer a list ofints
(see Unnecessary cuda synchronizations that we should remove in PyTorch pytorch/pytorch#108968).
Related
- Adding tuple argument support to
numpy.repeat
to avoid repeated invocations: ENH: Allow tuple arguments fornumpy.repeat
numpy/numpy#21435 and ENH: Introduce multiple pair parameters in the 'repeat' function numpy/numpy#23937. - Mention of xarray's need for
repeat
: Common APIs across array libraries (1 year later) #187 (comment)
Activity
rgommers commentedon Jul 13, 2023
I think because the
repeats
andaxis
keywords may not containint
literals, but be derived from other variables. At that point, they can't be known statically. EDIT: it looks like one more case of "output shape depends on value of an input parameter".I think we should always do that. There are more places where flattening happens, and they should always be row-major I'd think.
Probably yes, but it's a relatively slow process to do so. The issue for it is pytorch/pytorch#50013, and in the tracker issue for numpy compat linked from there, it says:
Of course the longer that is not executed on, the less appetite there will be to do it later on.
rgommers commentedon Jul 13, 2023
Strangely enough, PyTorch and JAX don't have the same keyword for
tile
(xref gh-655).rgommers commentedon Jul 27, 2023
I looked at the PyTorch situation again; it looks a little annoying.
torch.repeat
could in principle be added now, with behavior compatible withnp.repeat
and not with theTensor.repeat
method. I think the equally annoying thing is that it's not supported as of now, so it's just more work and no one is lining up to do that work.@lezcano do you have any thoughts on this one?
lezcano commentedon Jul 27, 2023
Having a method and a free function that have different behaviour is not a good idea (if I understood the proposal correctly). The whole update process is also a non-go, has we'd need to do it in 3 releases, and the time between releases now has gone up (unofficially) to almost a year.
Now, this should not stop the array api from standarising this, as this function is pretty much
repeate_interleave
. One should just be careful to cast the tuple into a tensor and populate theoutput_size
kwarg correctly.nb. it doesn't make any sense that in PyTorch's
repeat_interleave
therepeats
arg is a tensor and not a tuple...oleksandr-pavlyk commentedon Sep 6, 2023
I argue we should allow value of
repeat
to be an array as well (we should not insist it to be a tuple of ints).Offloading libraries must convert this vector and transfer data to an accelerator during the implementation anyway. Supporting array input would unlock certain use-cases, such a programmatic generation of repeats (random sampling, procedural creation, etc.)
kgryte commentedon Sep 7, 2023
@oleksandr-pavlyk I think this is a bug in the proposal.
repeats
should be allowed to be either anint
or an array of ints.vadimkantorov commentedon Sep 10, 2023
Supporting arrays is also useful for cases like PyTorch where there might be improvements if the repeats/counts argument is already a tensor obtained from elsewhere and especially if it's already on the target device
lezcano commentedon Sep 10, 2023
@vadimkantorov What do you mean by "there might be improvements"? There are no improvements, and it may even be detrimental to pass a tensor to this function if it's in CUDA.
If you think otherwise, it'd be useful to show concrete examples.
repeat
to the specification #690leofang commentedon Sep 22, 2023
Hi all, this RFC (and PR #690) was briefly discussed in the Array API meeting yesterday. Apology to @kgryte @oleksandr-pavlyk @seberg, for comments I made in the meeting without fully understanding the sticking points here. Below is my updated analysis & response.
As already pointed out earlier this thread
repeat()
is challenging in that the returned shape dynamically depending on the inputs, unlike most APIs that have a predetermined contract for statical analysis. So this is still an undressed question (already documented in the RFC) so far.Moreover, with regard to the
repeats
argument taking an array object, I made a comment yesterday that stream ordering should help. In theory this would be true, but it involves implementation details of an array library. Specifically, array metadata (shape/strides/...) are usually stored on the host (as in CuPy/PyTorch), and so in such cases a blocking D2H copy is unavoidable for these libraries to get this info on host and allocate an array accordingly, and preserving stream ordering is not sufficient.However, again some changes in the implementation can help. For example, if the array library can store array metadata on CUDA unified, pinned or heterogeneous memory, preserving stream ordering alone is sufficient, and no blocking copy is needed. (Consider the following CUDA events are ordered:
repeats
created on device -> array allocated by zero-copy accessingrepeats
from host via stream- wait event & callback.) So it's not the end of the world, and array libraries can evolve to work around implementation hurdles.API-wise speaking, there's no issues. The semantics is well-defined, pending the raw-major discussion which is a separate topic.
CuPy chose to not support taking arrays for
repeats
for the same reason (cupy/cupy#1918, cupy/cupy#3849), sorry I didn't answer it right yesterday. But, since it's generally not possible to guarantee (despite our best effort) asynchronousity for all APIs, I am not convinced we cannot change should the standard accepts it (cc: @kmaehashi). On the CuPy side, we can always document that a certain combination of arguments may sync the device.It has been a longstanding, recurrent user request to support
repeats
taking an array (see the above CuPy issues for example; this RFC and even earlier this thread also gave similar requests). I believe for user ergonomics it is the right choice to do. I disagree with @lezcano thatThe UX improvement is large and also anticipated by many downstream developers and users. So, as long as we address the dynamic shape issue (I suspect we'd end up with adding a remark, similarly to what we did for the
unique*()
APIs), I would support it.Since @jpivarski had kindly offered a partial workaround to CuPy, I'd like to have Jim to chime in. I'd also eager to have Jax developers involved, maybe @jakevdp can offer insight?
lezcano commentedon Sep 22, 2023
What about libraries that create compiled code like JAX or
torch.compile
, or libraries that use cudagraphs. In this function, you cannot even give a bound on the size of the output without looking atrepeats
, so this function cannot be compiled, certainly not into a CUDA graph.Regarding UX, my point is that if you have a repeats tensor, you can always do in PyTorch a
tensor.tolist()
and pass it to this function. This goes in line with the "it's better explicit than implicit".8 remaining items