Should tensordot broadcast the contracted dimensions? #294

asmeurer · 2021-10-29T21:08:04Z

Should tensordot broadcast the contracted dimensions. For example, say we contract the first dimensions here

tensordot(ones((3, 3)), ones((1, 3)), axes=((0,), (0,)))

The dimension 3 and 1 do not match, but if we broadcast the arrays together first they both become shape (3, 3), after which they do match.

The spec is a little unclear about this https://data-apis.org/array-api/latest/API_specification/linear_algebra_functions.html#tensordot-x1-x2-axes-2. It says x2 must be compatible with x1 by broadcasting, which seems to imply unconditional broadcasting. But it also says "Each axis (dimension) x1_axes[i] for x1 must have the same size as the respective axis (dimension) x2_axes[i] for x2."

NumPy disallows broadcasting in contracted dimensions (it does broadcast non-contracted dimensions):

>>> np.tensordot(np.ones((3, 3)), np.ones((1, 3)), axes=((0,), (0,)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 181, in tensordot
  File "./numpy/core/numeric.py", line 1110, in tensordot
    raise ValueError("shape-mismatch for sum")
ValueError: shape-mismatch for sum
>>> np.tensordot(np.ones((3, 3)), np.ones((1, 3)), axes=((1,), (1,)))
array([[3.],
       [3.],
       [3.]])

Pytorch broadcasts all dimensions, including contracted ones (note that pytorch still calls its axes argument dims)

>>> torch.tensordot(torch.ones((3, 3)), torch.ones((1, 3)), dims=((0,), (0,)))
tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])
>>> torch.tensordot(torch.ones((3, 3)), torch.ones((1, 3)), dims=((1,), (1,)))
tensor([[3.],
        [3.],
        [3.]])

Note that in either case, the resulting array shape is based on the non-broadcasted input shapes, so it's not as simple as wrapping the call with broadcast_arrays.

>>> np.tensordot(np.ones((3, 3)), np.ones((2, 3, 3)), axes=((-1,), (2,))).shape
(3, 2, 3)
>>> np.tensordot(np.ones((2, 3, 3)), np.ones((2, 3, 3)), axes=((-1,), (2,))).shape
(2, 3, 2, 3)

The text was updated successfully, but these errors were encountered:

asmeurer · 2021-11-01T20:12:05Z

CC @lezcano do you have any thoughts on this?

kgryte · 2021-11-04T19:06:35Z

In today's call, we decided to align with NumPy's behavior, as advocated for by @leofang, @oleksandr-pavlyk, and @rgommers. Given PyTorch's relatively recent addition of tensordot and NumPy's alignment with matmul (which also does not broadcast the innermost two dimensions), seems reasonable to adopt NumPy's behavior in this instance.

@IvanYashchuk did mention elsewhere that PyTorch's behavior can match einsum behavior as follows. If we express the tensordot operation with dims=((1,), (1,)) using einsum ("ab,dc->ad" ), then NumPy would compute the same as torch.tensordot.

import torch
import numpy as np

 a1 = np.random.normal(size=(3, 3))
 a2 = np.random.normal(size=(3, 1))

np.einsum("ab,dc->ad", a1, a2)
# array([[ 1.44946877, -1.0152814 , -1.39638556],
#        [ 1.48299164, -1.03876252, -1.42868074],
#        [-0.37928214,  0.26566844,  0.36539187]])

torch.tensordot(*map(torch.from_numpy, (a1, a2)), dims=((1,), (1,)))
# tensor([[ 1.4495, -1.0153, -1.3964],
#         [ 1.4830, -1.0388, -1.4287],
#         [-0.3793,  0.2657,  0.3654]], dtype=torch.float64)

# A slower but equivalent way of computing the same as with np.einsum "ik,jn->ij"
# or torch.tensordot with dims=((1,), (1,))
result = np.zeros((3, 3))
for i in range(result.shape[0]):
  for j in range(result.shape[1]):
    for k in range(a1.shape[1]): # reducing dim/axis #1
      for n in range(a2.shape[1]): # reducing dim/axis #1
        result[i, j] += a1[i, k] * a2[j, n]
# array([[ 1.44946877, -1.0152814 , -1.39638556],
#        [ 1.48299164, -1.03876252, -1.42868074],
#        [-0.37928214,  0.26566844,  0.36539187]])

rgommers · 2021-11-05T09:14:07Z

Given PyTorch's relatively recent addition of tensordot and NumPy's alignment with matmul (which also does not broadcast the innermost two dimensions), seems reasonable to adopt NumPy's behavior in this instance.

A summary of additional considerations for why the NumPy behavior is preferred:

In the case of ambiguity, it's better to raise an exception than to pick a behavior
In the future it's possible to add more behavior if desired; taking it away is not

kgryte added the topic: Linear Algebra Linear algebra. label Oct 30, 2021

kgryte added this to the v2021 milestone Oct 30, 2021

kgryte mentioned this issue Nov 8, 2021

Fix guidance for tensordot broadcasting #324

Merged

kgryte closed this as completed in #324 Nov 9, 2021

asmeurer mentioned this issue Apr 8, 2022

Broadcast behavior in linalg.cross #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should tensordot broadcast the contracted dimensions? #294

Should tensordot broadcast the contracted dimensions? #294

asmeurer commented Oct 29, 2021

asmeurer commented Nov 1, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

rgommers commented Nov 5, 2021

Uh oh!

Should tensordot broadcast the contracted dimensions? #294

Should tensordot broadcast the contracted dimensions? #294

Comments

asmeurer commented Oct 29, 2021

asmeurer commented Nov 1, 2021

Uh oh!

kgryte commented Nov 4, 2021

Uh oh!

rgommers commented Nov 5, 2021

Uh oh!