Skip to content

Conversation

WBobby
Copy link
Collaborator

@WBobby WBobby commented Jun 29, 2022

Fixes #ISSUE_NUMBER

@jithunnair-amd jithunnair-amd merged commit 572c17c into ROCm:rocm5.2_internal_testing Jun 29, 2022
jithunnair-amd pushed a commit that referenced this pull request Jul 12, 2022
jithunnair-amd pushed a commit that referenced this pull request Aug 30, 2022
pruthvistony pushed a commit that referenced this pull request Nov 10, 2022
akashveramd pushed a commit that referenced this pull request Jun 13, 2025
… now that it's supported in core (#1031)

## Summary

In pytorch#149876 I found there was a
problem with per op SAC, per layer SAC, and no AC because all these
settings saved reduce_scatter_tensor for backward but this a problem: it
broke the async TP pattern matching which expects reduce scatter node to
only have 1 user (wait_tensor), not 2 (wait_tensor and output_node).

In pytorch#149946 I addressed this by:

1) Adding new graph patterns to match on which allow reduce_scatter to
have 2 users.
2) Updating the subgraph replacement logic to save the "fused matmul
reduce scatter" node for backward instead of the reduce scatter node, if
it detects the graph is saving reduce_scatter for backward. This allows
the original matmul reduce scatter graph to be replaced and erased
correctly.

Once pytorch#149946 is landed, we can
add back reduce_scatter_tensor to the op save list for SAC in
torchtitan, and it won't break SAC and no AC anymore 👍
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants