Skip to content

Conversation

zasdfgbnm
Copy link
Collaborator

Fixes #2094

Comment on lines +164 to +165
default:
TORCH_INTERNAL_ASSERT(false, "Unknown tensor memory type.");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but I think it is a good idea to have it

@zasdfgbnm zasdfgbnm marked this pull request as ready for review October 28, 2022 05:31
@zasdfgbnm zasdfgbnm requested a review from naoyam October 28, 2022 05:31
Copy link
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@naoyam
Copy link
Collaborator

naoyam commented Oct 29, 2022

So, is this an alternative fix for the problem you previously proposed the "mirror" approach? How does having dummy outputs solve the problem?

@zasdfgbnm
Copy link
Collaborator Author

zasdfgbnm commented Oct 29, 2022

So, is this an alternative fix for the problem you previously proposed the "mirror" approach? How does having dummy outputs solve the problem?

Yes, it is. Let's say we had

T0[I]
T1[b b I] = broadcast(T0)
T2[b b r] = reduction(T1)
T3[b b b] = broadcast(T2)
T4[b, b, I] = T1 + T3
T5[b, b, r] = reduction(T4)

After projection, it becomes

T0[I]
T1[b b I] = broadcast(T0)
T2[b b r] = reduction(T1)
T3[b b b] = broadcast(T2)
T6[b b I] = broadcast(T0)
T4[b, b, I] = T6 + T3
T5[b, b, r] = reduction(T4)

Then neither the propagation path T2->T3->T4->T5 nor T2->T1->T0->T6->T4->T5 works because they both have missing root domain. But adding T7 = T1 + T6 creates a new propagation path T2->T1->T7->T6->T4->T5 which has all root domain information.

(edited)

@naoyam
Copy link
Collaborator

naoyam commented Oct 29, 2022

Oh, I see. Interesting approach. Please have your explanation as a code comment.

I wonder how robust and generic this approach would be, but seems like good enough.

@csarofeen
Copy link
Owner

If it was directly on iter domains instead of tensor views it would be very robust. The fact that it's on tensor views means that we can only build relationships across tensorviews that can go through a pointwise operation. Since this is replicated tensors exactly, it should be sufficient for all our use cases. Though same day in the future all replays should probably be IterDomain based, not TensorView based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codegen failure in HuggingFace AllenaiLongformerBase with Autocast

3 participants